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HYPOTHESIS  TESTING  USING  SPATIALLY  DEPENDENT  HEAVY-TAILED  MULTISENSOR  DATA 

ABSTRACT 

The  detection  of  spatially  dependent  heavy-tailed  signals  is  considered  in  this  dissertation.  While  the  central  limit 
theorem,  and  its  implication  of  asymptotic  normality  of  interacting  random  processes,  is  generally  useful  for  the 
theoretical  characterization  of  a  wide  variety  of  natural  and  man-made  signals,  sensor  data  from  many  different 
applications,  in  fact,  are  characterized  by  non-Gaussian  distributions.  A  common  characteristic  observed  in  non- 
Gaussian  data  is  the  presence  of  heavy-tails  or  fat  tails.  For  such  data,  the  probability  density  function  (p.d.f.)  of 
extreme  values  decay  at  a  slower-than-exponential  rate,  implying  that  extreme  events  occur  with  greater  probability. 
When  these  events  are  observed  simultaneously  by  several  sensors,  their  observations  are  also  spatially  dependent.  In 
this  dissertation,  we  develop  the  theory  of  detection  for  such  data,  obtained  through  heterogeneous  sensors.  In  order 
to  validate  our  theoretical  results  and  proposed  algorithms,  we  collect  and  analyze  the  behavior  of  indoor  footstep 
data  using  a  linear  array  of  seismic  sensors.  We  characterize  the  inter-sensor  dependence  using  copula  theory. 

Copulas  are  parametric  functions  which  bind  univariate  p.d.f.s,  to  generate  a  valid  joint  p.d.f. 

We  model  the  heavy-tailed  data  using  the  class  of  alpha-stable  distributions.  We  consider  a  two-sided  test  in  the 
Neyman-Pearson  framework  and  present  an  asymptotic  analysis  of  the  generalized  likelihood  test  (GLRT).  Both, 
nested  and  non-nested  models  are  considered  in  the  analysis.  We  also  use  a  likelihood  maximization-based  copula 
selection  scheme  as  an  integral  part  of  the  detection  process.  Since  many  types  of  copula  functions  are  available  in 
the  literature,  selecting  the  appropriate  copula  becomes  an  important  component  of  the  detection  problem.  The 
performance  of  the  proposed  scheme  is  evaluated  numerically  on  simulated  data,  as  well  as  using  indoor  seismic  data. 
With  appropriately  selected  models,  our  results  demonstrate  that  a  high  probability  of  detection  can  be  achieved  for 
false  alann  probabilitiesof  the  order  of  10e-4. 

These  results,  using  dependent  alpha-stable  signals,  are  presented  for  a  two-sensor  case.  We  identify  the 
computational  challenges  associated  with  dependent  alpha-stable  modeling  and  propose  alternative  schemes  to 
extend  the  detector  design  to  a  multisensor  (multivariate)  setting.  We  use  a  hierarchical  tree  based  approach,  called 
vines,  to  model  the  multivariate  copulas,  i.e.,  model  the  spatial  dependence  between  multiple  sensors.  The 
performance  of  the  proposed  detectors  under  the  vine-based  scheme  are  evaluated  on  the  indoor  footstep  data,  and 
significant  improvement  is  observed  when  compared  against  the  case  when  only  two  sensors  are  deployed.  Some 
open  research  issues  are  identified  and  discussed. 


Abstract 


The  detection  of  spatially  dependent  heavy-tailed  signals  is  considered  in  this  dissertation. 
While  the  central  limit  theorem,  and  its  implication  of  asymptotic  normality  of  interacting 
random  processes,  is  generally  useful  for  the  theoretical  characterization  of  a  wide  variety  of 
natural  and  man-made  signals,  sensor  data  from  many  different  applications,  in  fact,  are  char¬ 
acterized  by  non-Gaussian  distributions.  A  common  characteristic  observed  in  non-Gaussian 
data  is  the  presence  of  heavy-tails  or  fat  tails.  For  such  data,  the  probability  density  func¬ 
tion  (p.d.f.)  of  extreme  values  decay  at  a  slower-than-exponential  rate,  implying  that  extreme 
events  occur  with  greater  probability.  When  these  events  are  observed  simultaneously  by  sev¬ 
eral  sensors,  their  observations  are  also  spatially  dependent.  In  this  dissertation,  we  develop  the 
theory  of  detection  for  such  data,  obtained  through  heterogeneous  sensors.  In  order  to  validate 
our  theoretical  results  and  proposed  algorithms,  we  collect  and  analyze  the  behavior  of  indoor 
footstep  data  using  a  linear  array  of  seismic  sensors.  We  characterize  the  inter-sensor  depen¬ 
dence  using  copula  theory.  Copulas  are  parametric  functions  which  bind  univariate  p.d.f.s,  to 
generate  a  valid  joint  p.d.f. 

We  model  the  heavy-tailed  data  using  the  class  of  a-stablc  distributions.  We  consider  a 
two-sided  test  in  the  Ney man-Pear son  framework  and  present  an  asymptotic  analysis  of  the 
generalized  likelihood  test  (GLRT).  Both,  nested  and  non-nested  models  are  considered  in 
the  analysis.  We  also  use  a  likelihood  maximization-based  copula  selection  scheme  as  an 
integral  part  of  the  detection  process.  Since  many  types  of  copula  functions  are  available  in  the 
literature,  selecting  the  appropriate  copula  becomes  an  important  component  of  the  detection 
problem.  The  performance  of  the  proposed  scheme  is  evaluated  numerically  on  simulated 
data,  as  well  as  using  indoor  seismic  data.  With  appropriately  selected  models,  our  results 
demonstrate  that  a  high  probability  of  detection  can  be  achieved  for  false  alarm  probabilities 


of  the  order  of  10-4. 

These  results,  using  dependent  a- stable  signals,  are  presented  for  a  two-sensor  case.  We 
identify  the  computational  challenges  associated  with  dependent  a -stable  modeling  and  pro¬ 
pose  alternative  schemes  to  extend  the  detector  design  to  a  multisensor  (multivariate)  setting. 
We  use  a  hierarchical  tree  based  approach,  called  vines,  to  model  the  multivariate  copulas,  i.e., 
model  the  spatial  dependence  between  multiple  sensors.  The  performance  of  the  proposed  de¬ 
tectors  under  the  vine-based  scheme  are  evaluated  on  the  indoor  footstep  data,  and  significant 
improvement  is  observed  when  compared  against  the  case  when  only  two  sensors  are  deployed. 
Some  open  research  issues  are  identified  and  discussed. 
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Chapter  1 

Introduction 


Our  lives  today  are  constantly  aided  and  enriched  by  various  types  of  sensors,  which  are  de¬ 
ployed  ubiquitously.  They  perform  different  roles,  based  on  the  context  of  their  deployment. 
For  example,  as  a  part  of  modem  mobile  devices,  we  commonly  find  GPS  information  overlaid 
over  image  data,  and  this  forms  the  basis  of  an  augmented  reality  system.  When  deployed  as  a 
part  of  the  different  living  spaces  we  occupy,  sensors  such  as  C02  and  infrared  modalities  can 
be  used  indoors,  at  the  front  end  of  an  energy-aware  intelligent  indoor  environmental  control 
system.  Traffic  cameras  and  GPS  sensors  can  be  used  outdoors  to  assist  drivers  navigate  busy 
rush-hour  traffic. 

In  each  of  the  above  applications,  sensors  of  different  types,  i.e.,  heterogeneous  sensors, 
are  used  to  make  complex  inferences  about  an  underlying  observed  process.  This  is  similar, 
in  many  ways,  to  how  we,  as  humans,  combine  or  fuse  different  streams  of  information  orig¬ 
inating  from  our  sense  organs.  Over  the  past  two  decades,  the  field  of  information  fusion  has 
been  extensively  studied  and  researched.  Although  there  exists  a  rich  body  of  literature,  the 
increasing  complexity  of  systems  as  well  as  the  vast  diversity  of  applications  require  constant 
revision  to  existing  technologies  and  continued  research  in  this  area. 


In  many  inference  applications,  it  is  sufficient  to  deploy  sensors,  such  as  seismic  or  acoustic 


2 


modalities,  which  are  capable  of  providing  one  dimensional  time-series  data.  Sensing  modali¬ 
ties  such  as  video  or  infrared  cameras  have  the  ability  to  provide  richer  quality  of  information, 
but  are  either  not  practical  to  deploy,  or  have  other  constraints  that  do  not  permit  their  use 
in  certain  applications.  For  example,  in  an  urban  combat  scenario,  soldiers  may  require  the 
surveillance  of  cleared  buildings.  For  this  application,  the  use  of  video  cameras  may  either 
require  a  deployment  and  setup  time  which  is  not  available,  or  there  may  exist  critical  areas 
that  need  monitoring  but  are  occluded  from  a  camera’s  field  of  view.  When  sensors  are  used  for 
patient  monitoring  in  hospitals,  it  is  quite  common  to  have  situations  where  privacy  concerns 
preclude  the  use  of  a  video  or  similar  imaging  modality.  In  this  dissertation,  we  are  motivated 
by  such  applications  and,  in  particular,  develop  appropriate  theory,  and  for  validation,  apply  it 
to  the  data  obtained  from  seismic  sensors  deployed  for  indoor  personnel  monitoring. 

The  outcome  of  the  information  fusion  process  is,  usually,  some  form  of  inference  about 
the  scene  or  phenomenon  being  observed.  The  phenomenon  is  context  specific  and,  therefore, 
varies  with  the  application  being  considered,  e.g.,  personnel  movement  for  surveillance,  patient 
health  in  a  health-care  facility  or  habitability  of  the  room  for  an  indoor  environment  control 
application.  The  inference  tasks  could  consist  of  detecting  or  estimating  some  parameters, 
such  as  locations  or  tracks,  that  provide  information  for  situational  awareness.  The  inferred 
parameters  are  a  function  of  the  specific  model  being  considered,  and  emerge  from  the  context/ 
application  under  consideration. 

Data  from  sensors  typically  exhibit  information  heterogeneity  that  can  arise  from  a  wide 
variety  of  causes.  The  sensors  deployed  in  a  given  region  of  interest,  in  the  most  general  set¬ 
ting,  may  consist  of  rather  disparate  and  incommensurate  modalities.  Even  sensors  of  the  same 
modality  may  exhibit  differences  in  their  sensing  ability,  due  to  differences  during  manufac¬ 
turing,  quality  control  or  the  duration  and  location  of  their  deployment.  Since  these  sensors 
also  observe  different  aspects  of  the  same  phenomenon,  their  observations  are  also  dependent. 
The  nature  of  this  dependence  can  be  quite  complex  and  nonlinear,  especially  in  cases  where 
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the  signal  may  propagate  through  a  non-homogeneous  medium.  Additionally,  the  nature  of  the 
phenomenon,  as  well  as  the  medium,  can  potentially  result  in  non-Gaussian  sensor  measure¬ 
ments. 

Fault  tolerance  and  enhanced  performance  are  key  systemic  advantages  that  result  from 
fusing  heterogeneous  information  sources  because  of  the  diversity,  redundancy  and  increased 
coverage  that  they  provide.  As  a  consequence  of  heterogeneity,  the  quality  and  quantity  of 
information  provided  by  each  sensing  “modality”,  which  can  potentially  include  human  in¬ 
telligence,  varies  with  each  source.  In  this  sense,  the  words  “sensor”  and  “node”  are  used 
interchangeably  here  and  refer  to  any  source  of  data.  Note  that  while  local  observations  and  in¬ 
ferences  from  a  group  of  heterogeneous  sensors  monitoring  the  same  phenomenon  may  exhibit 
statistical  dependence,  they  still  provide  different  characterizations  of  the  phenomenon  under 
observation.  Thus,  the  entire  network  does  not  fail  as  a  result  of  one  modality  getting  compro¬ 
mised.  However,  an  accurate  characterization  of  the  inter-modality  dependence  is  necessary 
for  making  reliable  system-wide  inference. 

The  above  considerations  are  central  to  the  ideas  explored  in  this  dissertation.  We  pri¬ 
marily  investigate  detection  problems,  from  an  information  fusion  perspective,  when  sensor 
observations  are  heterogeneous,  dependent  and  heavy-tailed  (non-Gaussian).  Throughout  this 
dissertation,  we  use  footstep  detection  as  an  example  application.  For  this  we  consider  in¬ 
door  seismic  signals  that  we  have  collected  using  geophone  sensors.  In  the  following  sections, 
we  systematically  introduce  the  main  ideas  related  to  information  fusion  using  heterogeneous, 
dependent,  heavy-tailed  multisensor  data. 

1.1  Statistical  approach  to  information  fusion 

The  typical  information  fusion  problem  consists  of  a  suite  of  networked  or  non-networked 
“sensors”  that  are  deployed  in  a  region  of  interest  (ROI).  The  word  “sensor”is  used  to  include 
not  only  physical  sensors,  but  any  source  capable  of  providing  information  based  on  its  obser- 
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vations  of  a  phenomenon  occurring  within  the  ROI.  Therefore,  local  decision  makers  such  as 
human  agents  are  also  considered  to  be  sensors.  Additionally,  when  monitoring  a  phenomenon 
of  interest,  the  suite  of  sensors  may  consist  of  heterogeneous  sensors. 

The  statistical  approach  to  information  fusion  considers  that  a  fusion  center  (FC)  receives 
data  from  L  sensors,  where  the  data  are  characterized  using  a  probabilistic  model.  The  nature 
of  the  problem  being  considered,  together  with  the  model  specification,  determines  the  specific 
inference  scheme  employed  by  the  fusion  center.  From  each  of  the  L  sensors,  the  fusion  center 
receives  a  sequence  of  N  observations,  xl3,  i  =  1,  2, . . . ,  L,  j  =  1, , . . ,  N.  Any  inference 
process  can  use  the  data  either  sequentially,  one  observation  at  a  time  with  an  appropriate 
update  rule,  or  take  a  one-shot  approach,  where  one  block  N  x  L  observations  are  used  for 
inference.  For  the  algorithms  and  techniques  proposed  in  this  dissertation,  we  consider  a  one- 
shot  approach. 

Each  Xij  is  a  realization  of  the  random  variable,  Xt.  In  this  dissertation,  we  consider  that  the 
random  variables  are  independent  and  identically  distributed  over  the  index  j,  but  are  dependent 
over  the  index  i.  These  X, ,  in  the  most  general  setting,  may  represent  analog  (unquantized) 
data,  soft  decisions  or  quantized  data,  or  1-bit  local  (hard)  decisions.  This  notation,  therefore 
accommodates  raw  sensor  observations  as  well  as  data  obtained  after  local  or  sensor-level 
processing. 

When  data  are  quantized  locally  to  a  1-bit  resolution,  they  often  represent  the  case  when 
sensors  have  additional  processing  capability  to  take  local  decisions.  From  the  fusion  perspec¬ 
tive,  this  is  also  known  as  decision-level  fusion.  Local  sensor-level  processing  which  results 
in  either  a  one-bit  or  M -bit  output  (with  small  M,  i.e.,  coarse  quantization)  is  often  used  in 
wireless  sensor  networks  (WSN).  A  typical  WSN  is  a  network  comprising  power  and  band¬ 
width  constrained  sensors  as  the  nodes  of  the  network;  these  sensors  transmit  their  hard  or  soft 
decisions  to  the  fusion  center  through  a  wireless  channel. 

Feature-level  fusion  refers  to  the  case  when  sensors  have  the  computational  resources  for 
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complex  signal  processing,  such  that  some  descriptive  features  may  be  extracted  from  the  raw 
data.  Such  features  may  include  likelihood  values,  spectral  coefficients,  or  the  coefficients 
obtained  from  some  other  transform  domain  operation.  Classification  problems  often  employ 
such  features  which,  when  appropriately  designed,  provide  a  multidimensional  basis  for  dis¬ 
criminating  between  the  various  classes  under  consideration.  Feature  extraction  is  also  an  ef¬ 
fective  way  to  process  signals,  which  in  their  raw  unprocessed  form,  may  be  incommensurate. 
When  Xi  represents  unprocessed  observations,  such  a  fusion  scheme  is  referred  to  as  data-level 
fusion.  This  also  represents  the  case  where  there  is  no  decentralization  of  the  decision-making 
process.  Effectively,  the  fusion  center  is  the  only  processing  unit  in  the  entire  system. 

Irrespective  of  the  fusion  levels  or  the  nature  of  quantization,  the  design  and  analysis  of  a 
fusion  method,  from  a  statistical  perspective,  requires  the  probabilistic  specification  of  X,  .  The 
sensor  model  for  the  i-th  sensor  is,  for  analog  data,  the  univariate  probability  density  function 
(p.d.f.)  or,  for  quantized  data,  the  probability  mass  function  (p.m.f.)  of  X,.  The  sensors  may 
be  deployed  in  various  spatial  configurations  or  topologies.  In  this  dissertation,  we  assume  that 
the  sensors  send  their  observations  to  the  fusion  center,  in  parallel  -  without  communicating 
with  each  other.  This  architecture  is  called  the  parallel  fusion  architecture  in  the  distributed 
inference  literature  (see  Fig.  1.1).  Here,  each  sensor  is  depicted  by  a  different  shape:  this  is  to 
indicate  that  the  sensors  could  be  of  possibly  different  modalities. 

The  FC  applies  a  fusion  rule,  which  is  a  function  defined  on  all  X%  and  determines  a  final 
decision  or  parameter  value,  based  on  the  inference  task  considered.  An  optimal  fusion  rule 
typically  maximizes  a  cost  function,  which  is  defined  for  the  entire  system.  Note  that  in  the 
preceding  discussion,  we  do  not  specify  the  statistical  nature  of  the  sensor  output;  we  only 
specify  that  all  Xt  are  the  input  to  the  fusion  center.  That  is,  any  distortion  to  the  sensor’s 
output  (e.g.,  additive  noise,  fading,  channel  attenuation,  etc.)  are  not  modeled  separately  and 
are  accommodated  within  X,  . 
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Fig.  1.1:  Parallel  topology  for  data  fusion.  Different  shapes  imply  different  sensor  modalities. 

1.2  Dependence  modeling 

When  sensors  observe  a  common  phenomenon,  as  shown  in  Fig.  1.1,  their  measurements  of¬ 
ten  exhibit  spatial  statistical  dependence.  This  dependence  may  emerge  in  spite  of  sensors 
observing  the  phenomenon  of  interest  as  independent  observers.  For  example,  signals  may  be 
modeled  as  being  embedded  in  additive  correlated  noise.  Such  a  model  is  typically  useful  when 
sensors  are  deployed  close  to  each  other,  e.g.,  in  an  acoustic  array  or  a  closely  spaced  array  of 
antenna  elements.  When  measurements  are  made  using  physical  sensors,  the  relevant  signals 
emerging  from  the  source  or  phenomenon  propagate  through  a  common  physical  medium,  be¬ 
fore  they  are  incident  at  the  sensor.  When  the  medium  of  propagation  is  non-homogeneous,  the 
dependence  structure  between  any  Xt  and  X,/ ,  i  ^  can  be  significantly  nonlinear.  Hence,  the 
commonly-used  second-order  measure  -  the  correlation  coefficient  -  becomes  an  inadequate 
measure  of  statistical  dependence. 

The  issue  of  statistical  dependence  is  even  more  complex  when  different  sensor  modalities 
are  used.  An  observed  phenomenon  may  give  rise  to  disparate  or  incommensurate  processes 
which  are  sensed  and  measured  differently  by  modalities  sensitive  to  the  signals  from  those 
respective  processes.  For  example,  consider  the  phenomenon  where  acoustic  and  video  modal¬ 
ities  observe  a  person  talking.  Here,  although  the  acoustic  and  video  data  are  not  coupled 
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via  a  shared  medium  of  propagation,  features  extracted  from  voice  data  (acoustic  sensor)  and 
image  sequences  of  lip  movements  (video  sensor)  are  statistically  dependent.  In  this  example, 
dependence  is  induced  by  the  phenomenon.  A  multimodal  deployment  necessarily  implies  het¬ 
erogeneity.  Suppose,  for  1  <  i  f  i'  <  L,  the  p.d.f.s  of  X,,  and  Xv  are  denoted  as  fXi  and  fXt, , 
sensors  i  and  i'  are  heterogeneous  if  fxi  f  fxt,  •  Note  that  if  sensor  i  is  an  acoustic  modality 
and  i'  provides  video  data,  fx,  and  fXi,  may  not  be  defined  on  the  same  support.  This  is  an 
additional  layer  of  complexity  when  modeling  the  joint  p.d.f.  fxi.x.,  •  The  joint  distribution 
of  sensor  measurements  is  necessary  for  any  inference  task.  In  this  dissertation,  copulas,  dis¬ 
cussed  in  detail  in  Chapter  3,  are  used  to  construct  valid  joint  distributions  describing  possibly 
nonlinear  dependence  structures,  such  that  each  Xt  can  be  heterogeneous. 

1.3  Heavy-tailed  signals 

Many  important  stochastic  phenomena  cannot  be  adequately  modeled  with  distributions  that 
decay  exponentially  in  the  tail.  For  such  phenomena,  extreme  value  measurements  occur  at  a 
significantly  greater  frequency  than  is  attributable  to  distributions  that  decay  exponentially  in 
the  tail.  One  can  typically  observe  a  “spiky”  signature  in  a  time  series  plot  of  these  measure¬ 
ments  and  such  signals  are  often  said  to  be  fat-tailed  or  heavy -tailed. 

Examples  of  such  signals  can  be  seen  in  applications  such  as  finance,  geology,  climatol¬ 
ogy  and  bioengineering.  In  many  scenarios  arising  from  these  applications,  the  detection  of 
significant  deviations  or  anomalies  from  a  process  describing  a  null  hypothesis  is  an  important 
task.  Many  of  these  anomalies  can  be  characterized  as  extreme-value  deviations  from  the  null 
process,  i.e.,  the  anomalies  occur  with  low  probability  and  fall  in  the  tail  regions  of  the  null 
hypothesis  distributions.  These  problems  have  been  studied  in  detail  when  the  underlying  dis¬ 
tributions  are  well-behaved  and  easy  to  characterize.  However,  when  distribution  tails  decay 
at  slower-than-exponential  rates,  the  inference  task  becomes  difficult  because  of  modeling  and 
associated  tractability  issues. 
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When  the  observed  data  have  heavy  tails  in  their  distribution,  the  consequences  of  improper 
model  selection  become  more  severe.  An  anomalous  process  being  observed  by  multiple  sen¬ 
sors  such  that  observations  are  dependent  implies  that,  effectively,  the  fusion  center  observes 
extreme  co-movements  in  the  distribution  tails.  Such  events  are  called  tail-dependent  events. 
Development  and  selection  of  models  capable  of  capturing  this  tail-dependence,  also  called 
extremal  dependence,  becomes  an  important  component  of  the  overall  inference  problem.  Ex¬ 
tremal  dependence  is  especially  relevant  in  the  context  of  modern  portfolio  theory.  When  there 
does  not  exist  sufficient  diversity  within  a  portfolio,  the  associated  risk  increases  given  an  ex¬ 
pected  return  or  profit.  When  distributions  characterizing  the  associated  risk  do  not  capture  the 
tail-heaviness  or  tail-dependence,  the  likelihood  corresponding  to  high  risk  values  are  underes¬ 
timated,  which  affects  the  reliability  of  decisions.  The  application  of  improper  models,  where 
tail-dependence  was  inadequately  quantified,  was  considered  to  be  one  of  the  causes  for  the 
financial  crisis  of  2007-2008. 

In  this  dissertation,  we  focus  on  footstep  signals,  acquired  from  an  array  of  seismic  sensors, 
and  show  that  they  can  be  modeled  as  dependent  heavy  tailed  signals.  Using  these  signals  as 
a  motivating  example,  the  theory  for  modeling  and  detecting  spatially  dependent  heavy  tailed 
signals  is  studied.  The  effect  of  model  selection,  as  an  integral  part  of  the  detection  framework, 
is  also  analyzed. 

1.4  Literature  review 

Multisensor  signal  processing  may  be  viewed  as  a  subset  of  the  broader  field  of  information 
fusion.  Centralized  formulations,  where  raw  observations  are  available  at  the  processing  unit 
or  fusion  center,  for  several  inference  tasks  are  well  known  and  available  in  standard  text¬ 
books  [12,47,94].  Distributed  inference,  on  the  other  hand,  relies  on  the  availability  of  a 
network  that  can  either  transmit  local  inferences/quantized  measurements  to  the  fusion  cen¬ 
ter  or  arrive  at  a  consensus  solution  by  locally  exchanging  compressed/quantized  information. 
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While  research  in  this  area  has  forked  in  various  directions,  the  problems  addressed  can  be  cat¬ 
egorized  as  either  distributed  detection  [98]  or  decentralized  estimation  (e.g.,  see  [66,71,78] 
and  references  cited  therein). 

This  section  reviews  recent  progress  that  has  taken  place  in  the  field  of  multisensor  signal 
processing,  and  focuses  on  developments  where  dependence  information  plays  a  significant 
role  in  the  design.  The  aim  of  the  discussion,  as  presented,  is  to  motivate  the  relevance  of  our 
research  presented  in  this  dissertation.  One  of  the  major  themes  explored  in  this  dissertation 
is  the  concept  of  statistical  dependence.  Therefore,  this  section  discusses  the  literature  in  the 
context  of  different  types  of  dependence  models  that  have  been  employed  over  the  years.  The 
emphasis  on  dependence  notwithstanding,  the  literature  is  quite  extensive,  and  instead  of  being 
exhaustive,  we  concentrate  on  highlighting  newer  developments. 

1.4.1  Dependence  as  covariance 

Modeling  dependence  as  a  covariance  matrix  (or  equivalently  a  correlation  matrix)  is  arguably 
one  of  the  most  popular  ways  of  characterizing  dependence.  It  defines  the  dependence  of 
jointly  normal  random  variables  and  describes  the  linear  dependence  between  random  variables 
that  possess  a  finite  second  moment.  Due  to  the  inherent  simplicity  associated  with  the  use  of 
second  order  statistics  such  as  the  correlation  coefficient,  it  has  been  applied  in  various  contexts 
in  both  centralized  and  distributed  inference  schemes. 

Centralized  schemes  for  correlated  sensor  observations 

In  the  centralized  paradigm,  covariance-based  dependence  modeling  is  used  extensively  to 
model  the  dependency  information  for  array  signal  processing  applications,  especially  where  it 
is  reasonable  to  assume  linearity  of  the  medium  of  signal  propagation.  The  most  recent  applica¬ 
tions  where  these  concepts  of  array  signal  processing  have  been  applied  are  MIMO  radar  [59] 
and  joint  blind  source  separation  (JBSS)  [4],  among  others.  In  MIMO  radar,  several  antenna 
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elements  are  used  to  transmit  multiple  probing  signals  that  may  be  correlated  or  uncorrelated 
with  one  another.  While  traditional  blind  source  separation  problems  are  formulated  using  a 
single  dataset,  JBSS  formulations  are  useful  when  analyzing  multiple  datasets  as  a  group.  An 
example  of  this  is  separating  speech  and  audio  signals  in  multiple  frequency  bands. 

The  fusion  of  EEG  with  fMRI  data  for  the  detection  of  schizophrenia  is  discussed  by  Cor¬ 
rea  et  al.  [19]  where  the  brain  tissue  is  modeled  as  a  mixing  channel,  and  hence  the  information 
fusion  problem  is  posed  as  a  JBSS  problem  and  is  solved  using  an  approach  based  on  multivari¬ 
ate  canonical  correlation  analysis  [48].  Canonical  correlation  analysis  (CCA)  is  a  technique 
which  transforms  the  data  matrix  in  such  a  way  that  it  maximizes  the  amount  of  correlation 
between  the  entities  exhibiting  statistical  dependence.  It  has  also  been  used  for  audio-video 
fusion:  Slaney  and  Covell  [86]  use  CCA  to  measure  the  synchrony  between  acoustic  features 
and  video  frames,  while  Kidron  et  al.  [49]  consider  a  CCA  based  approach  to  determine  pixels 
in  images  that  exhibit  maximal  correlation  with  the  acquired  audio  signal. 

Distributed  inference  using  correlated  data 

Optimal  schemes  for  distributed  inference  with  correlated  observations  has  also  been  a  topic  of 
considerable  interest.  In  the  case  of  distributed  detection,  it  has  been  shown  that  the  likelihood 
ratio  based  quantizer,  which  was  optimal  under  the  assumption  of  conditional  independence, 
is  no  longer  optimal  when  correlation  is  taken  into  account.  Examples  of  the  consequent  loss 
in  performance  are  provided  by  Aalo  and  Viswanathan  [1].  In  fact,  earlier  work  by  Tsitsiklis 
and  Athans  [97]  has  shown  that  the  distributed  detection  problem  with  dependent  observations 
is  NP-complete.  One  way  to  get  past  the  computational  intractability  is  to  assume  some  prior 
knowledge  about  the  joint  statistics:  Drakopolous  and  Lee  [25]  examine  the  fusion  rule  for 
distributed  detection  under  dependence  by  considering  that  the  correlation  coefficient  is  known, 
whereas  Kam  et  al.  [44]  use  the  Bahadur-Lazarsfeld  expansion  of  probability  density  functions. 

Willett  et  al.  [107]  study  the  problem  of  distributed  detection  of  a  mean  shift  in  corre- 
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lated  Gaussian  noise  and  establish  how  the  nature  of  correlation  affects  the  optimum  fusion 
rule.  They  conclude  that  even  for  a  simple  two-sensor  and  linear  correlation  formulation  the 
distributed  detection  problem  “exhibits  apparently  very  complicated  behavior.”  For  this  mean 
shift  in  correlated  Gaussian  noise  problem,  local  quantizers  designed  using  the  likelihood  ra¬ 
tio  test  (LRT)  are,  in  general,  not  optimal.  Willet  et  al.  show  that  determining  the  parameter 
regions  where  this  optimality  may  hold  is  itself  a  challenging  task:  while  the  optimality  of 
the  LRT  can  be  determined  for  certain  parameter  regions,  the  problem  is  mostly  intractable 
for  other  regions.  Chen  et  al.  [16]  have  recently  proposed  a  more  general  formulation  this 
problem.  They  introduce  a  hidden  variable  that  induces  conditional  independence  among  the 
sensor  observations  so  that  many  more  distributed  detection  problems  with  dependent  obser¬ 
vations  become  tractable.  This  new  framework  allows  for  the  identification  of  several  classes 
of  distributed  detection  problems  with  dependent  observations  whose  optimal  decision  rules 
resemble  the  ones  for  the  conditionally  independent  case.  The  new  framework  induces  a  de¬ 
coupling  effect  on  the  forms  of  the  optimal  local  decision  rules  for  these  problems,  much  in 
the  same  way  as  the  conditionally  independent  case.  This  is  in  sharp  contrast  to  the  general 
dependent  case  where  the  coupling  of  the  forms  of  local  sensor  decision  rules  often  renders  the 
problem  intractable.  Such  decoupling  enables  the  use  of,  for  example,  the  person-by-person 
optimization  approach  to  find  optimal  local  decision  rules.  The  two  cases  of  distributed  detec¬ 
tion,  deterministic  signal  in  dependent  noise,  and  detection  of  a  random  signal  in  independent 
noise,  have  become  tractable  under  this  new  framework. 

The  decentralized  estimation  problem  with  correlated  observations  has  been  studied  by 
Fang  and  Li  [30].  They  consider  a  power  constrained  wireless  sensor  network  [92]  and  ex¬ 
amine  power  allocation  for  spatially  correlated  sensor  observations.  Each  sensor  transmits  a 
possibly  nonlinear  function  of  the  parameter  of  interest,  9 ,  that  is  corrupted  by  additive,  corre¬ 
lated  Gaussian  noise.  Bandwidth  constrained  formulations  requiring  quantized  transmissions 
to  the  fusion  center  are  also  considered  by  Ribeiro  and  Giannakis  [78].  However,  they  con- 
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sider  a  linear  observation  model,  with  6  being  deterministic  but  unknown,  and  hence  the  sensor 
observations  are  conditionally  independent.  Krasnopeev  et  al.  [52]  present  a  distributed  esti¬ 
mation  scheme  for  the  problem  Xi  =  9  +  nu  where  x,  is  the  measurement  of  sensor  i  and  the 
noise  n  =  [n,;]  is  a  multivariate  Gaussian  random  vector  which  is  correlated  spatially  across 
sensors.  The  covariance  is  assumed  to  be  known  at  the  fusion  center.  We  note  that  all  these 
problems  are  considered  to  be  distributed  since  each  local  sensor  transmits  some  local  estimate 
of  9,  which  in  its  simplest  form  is  the  noise  corrupted  parameter  itself.  These  formulations 
do  not  consider  local,  inter-node  communication;  the  implications  of  this  local  communication 
aspect  have  been  recently  investigated  by  Kar  et  al.  [46]. 

1.4.2  Nonlinear  dependence:  nonparametric  approach 

Nonparametric  approaches  to  multisensor  signal  processing  have  been  very  popular  in  applica¬ 
tions  where  it  is  infeasible  to  model  a  priori  the  complex  dependencies  that  may  exist  between 
the  signals/features  acquired  by  the  sensors.  These  methods,  in  essence,  estimate  or  learn  the 
joint  distribution  across  sensor  measurements  directly  from  the  data. 

Machine  learning  techniques  fall  under  this  framework  and  are  applicable  largely  when  it 
is  feasible  to  control  environment  variables  in  such  a  way  that  a  representative  training  dataset 
may  be  collected.  While  this  is  apparently  a  stringent  requirement,  often  with  some  prepro¬ 
cessing,  a  significant  amount  of  information  can  be  extracted  from  sensor  observations.  This 
has  led  to  the  successful  application  of  machine-learning  techniques  for  a  wide  variety  of  prob¬ 
lems.  Learning  based  methodologies  have  been  successfully  applied  to  multibiometric  sys¬ 
tems  [11,79].  Multibiometric  systems  achieve  superior  personnel  identification  performance 
by  fusing  information  from  two  or  more  biometric  modalities.  The  learning-based  approach 
has  also  been  popular  for  solving  several  object  classification  tasks  [43,64]  and  have  tradition¬ 
ally  focused  on  security  and  surveillance  applications  [58,  111].  Recently,  challenges  unique 
to  emerging  technologies  such  as  ubiquitous  and  human-centered  computing  have  led  to  new 
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research  in  areas  such  as  object  tracking  and  affect  recognition  [108, 109]. 

When  viewed  from  an  information  fusion  perspective,  nonparametric  designs  offer  tangible 
advantages  over  methods  described  in  Section  1.4.1.  Fusion  of  heterogeneous  or  multimodal 
information  is  possible  since  disparate  modalities  are  not  constrained  to  a  multivariate  normal 
approximation.  For  example,  Butz  and  Thiran  [14]  use  the  mutual  information  and  joint  en¬ 
tropy  between  audio  and  video  data  as  a  measure  of  dependence;  the  joint  density  required  for 
the  computation  of  these  quantities  is  estimated  from  the  data  using  the  nonparametric  Parzen’s 
estimator  [102].  Graphical  models  such  as  Bayesian  networks  generalize  hidden  Markov  mod¬ 
els  and  have  also  been  successfully  used  for  audio-visual  tracking  [8,22,43].  Algorithms  for 
distributed  fusion  using  graphical  models  have  been  developed  by  Get  in  et  al.  [15]. 

1.4.3  Nonlinear  dependence:  copula-based  approach 

As  indicated  earlier,  we  employ  copulas  to  characterize  joint  distributions.  Copulas  are  para¬ 
metric  functions  that  couple  univariate  marginal  distribution  functions  to  the  corresponding 
multivariate  distribution  function.  A  copula-based  formulation  is  attractive  because  the  spa¬ 
tial  correlation  among  sensor  observations  can  get  manifested  in  several  different,  potentially 
non-linear  ways  and  many  families  of  copula  functions  have  been  specified  in  the  literature  to 
address  this  issue.  Further,  while  nonparametric  formulations  are  known  to  converge  to  the  true 
distribution  asymptotically,  they  also  suffer  from  scalability  issues  stemming  from  the  curse  of 
dimensionality.  Recently,  considerable  progress  has  been  made  in  the  study  of  copulas  and 
their  applications  in  statistics.  The  usage  of  copulas  is  widespread  in  the  fields  of  economet¬ 
rics  and  finance  [17]  and  they  are  beginning  to  be  used  in  the  signal  and  image  processing 
context  [23,39,63,89]. 

In  the  fusion  context,  the  use  of  copulas  can  be  first  found,  in  the  operations  research 
context,  in  a  paper  by  Jouini  and  Clemen  [42].  They  propose  a  copula-based  method  for  the 
aggregation  of  expert  opinions.  They  take  a  Bayesian  approach  in  their  formulation:  they  con- 
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sider  Bayesian  decision-makers  who  make  subjective  assessments  about  the  observed  process. 
This  subjective  assessment  is  encoded  as  a  univariate  marginal  distribution,  which  is  combined, 
along  with  the  assessments  of  other  experts  using  a  copula.  In  order  to  elicit  the  copula  pa¬ 
rameter,  they  propose  a  multivariate  extension  of  Kendall’s  tau  as  a  measure  of  dependence 
between  multiple  experts. 

Sundaresan  et  al.  [88]  first  considered  the  case  of  distributed  detection  for  dependent  ob¬ 
servations,  using  a  copula  based  framework.  They  derived  the  optimum  fusion  rules  for  a 
Neyman-Pearson  detector.  In  their  work,  they  found  that  the  fusion  rules  under  copula-based 
dependence  have  a  similar  form  as  the  Bahadur-Lazarsfeld  expansions,  proposed  by  Kam  et 
al.  Sundaresan  and  Varshney  [87]  also  design  and  analyze  the  performance  of  a  copula-based 
estimation  scheme  for  the  localization  of  a  radiation  source. 

Iyengar  et  al.  [36]  have  investigated  the  general  framework  of  copula-based  detection  of  a 
phenomenon  being  observed  jointly  by  heterogeneous  sensors.  They  quantify  the  performance 
loss  due  to  copula  misspecification  and  demonstrate  that  a  detector  using  a  copula  selection 
scheme  based  on  area  under  the  receiver  operating  characteristic  (ROC)  can  provide  significant 
improvement  over  models  assuming  independence.  Their  results  on  a  NIST  multibiometric 
dataset  show  that  the  copula  based  approach  is  versatile  and  can  fuse  not  only  heterogeneous 
sensor  measurements,  but  can  also  be  applied  to  fuse  different  algorithms.  The  tractability 
issue  of  fusing  dependent  quantized  data  is  addressed  by  Iyengar  et  al.  [37].  In  this  paper,  the 
authors  found  that  injecting  a  suitably  designed  noise  variable,  the  optimum  fusion  rule  can 
be  approximated  for  a  minimum  level  of  distortion.  The  problem  of  intractability,  due  to  the 
presence  of  multiple  coupled  integrals,  is  reduced  to  a  problem  of  multiplying  characteristic 
functions,  similar  to  the  way  in  which  frequency  selective  filtering  is  done. 
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1.5  Contributions  and  organization 

The  main  contributions  of  the  research  results  presented  in  this  dissertation  to  the  signal  pro¬ 
cessing  and  information  fusion  literature,  are  as  follows: 

•  A  data-collection  procedure  was  designed  and  executed  to  create  a  dataset  of  footstep 
signals  obtained  using  seismic  sensors.  The  data  can  be  used  for  data-driven  problem- 
specific  tasks  such  as  investigating  procedures  for  indoor  personnel  occupancy  detection 
and  activity  classification.  It  also  serves  as  an  example  of  heavy-tailed  data  exhibiting 
spatio-temporal  dependence. 

•  The  theory  of  detection  for  dependent  a-stable  signals  is  studied  using  a  copula-based 
approach  for  dependence  characterization.  Issues  such  as  model  nesting  and  model  se¬ 
lection  are  studied  in  depth.  We  derive  the  necessary  and  sufficient  conditions  for  mul¬ 
tivariate  model  nesting  using  a  copula-based  approach  for  distribution  modeling.  We 
also  derive  asymptotic  results  for  the  probabilities  of  false-alarm  and  detection,  for  both 
nested  and  non-nested  hypotheses  for  detecting  dependent  heavy  tailed  observations. 

•  A  vine-based  approach  is  proposed  for  modeling  multisensor  dependence.  Using  bivari¬ 
ate  copula  building  blocks,  the  vine-based  approach  allows  us  to  construct  multivariate 
models  free  of  symmetry  constraints.  The  effect  of  model  selection  and  node  ordering  is 
investigated  in  the  context  of  footstep  detection.  A  tail-dependence  motivated  algorithm 
is  presented  for  establishing  a  node  order  for  the  base  tree  in  the  vine. 

In  Chapter  2,  we  discuss  the  data-collection  process.  The  data  was  collected  from  a  linear 
array  of  geophone  seismic  sensors.  This  chapter  describes  the  sensor  and  data-acquisition 
hardware  and  also  the  data  collection  procedure.  The  heavy-tailed  nature  of  the  footstep  data 
provides  the  motivation  for  considering  detection  schemes  tailored  specifically  for  dependent 
heavy-tailed  data. 
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Chapter  3  explores  the  background  on  statistical  dependence  and  introduces  copula  the¬ 
ory.  Measures  of  dependence,  other  than  the  correlation  coefficient,  are  surveyed  and  their 
connections  to  copula  functions  are  also  summarized. 

Chapter  4  examines  the  problem  of  detection  of  dependent  a- stable  signals.  We  use  the 
class  of  a-stablc  distributions  to  characterize  the  heavy-tailed  nature  of  these  signals.  For  typi¬ 
cal  applications,  sensors  make  simultaneous  measurements  of  a  given  phenomenon,  and  hence 
these  heavy-tailed  realizations  are  dependent  across  sensors.  The  inter-sensor  dependence  is 
modeled  using  copulas.  We  consider  a  two-sided  test  in  the  Neyman-Pearson  framework  and 
present  an  asymptotic  analysis  of  the  generalized  likelihood  test  (GLRT).  Both,  nested  and 
non-nested  models  are  considered  in  the  analysis.  The  performance  of  the  proposed  scheme  is 
evaluated  numerically  on  simulated  data,  as  well  as  the  indoor  seismic  data  described  in  Chap¬ 
ter  2.  With  appropriately  selected  models,  our  results  demonstrate  that  a  high  probability  of 
detection  can  be  achieved  for  false  alarm  probabilities  of  the  order  of  HP4.  While  the  theory 
presented  in  this  chapter  is  valid  for  multiple-sensor  deployments,  we  consider  a  two-sensor 
case  for  ease  of  exposition. 

In  Chapter  5,  we  address  copula  construction  and  model  selection  issues  for  the  multi¬ 
sensor  (i.e.,  multivariate)  case.  Using  the  Neyman-Pearson  approach,  we  show  that  accounting 
for  multivariate  dependence  leads  to  significant  improvement  over  a  bivariate  approach,  within 
the  copula  framework.  The  tree-based  technique  of  vines  are  used  for  modeling  the  dependence 
across  multiple  sensors.  The  vine  based  approach  is  able  to  model  asymmetric  dependence 
between  sensor  observations. 

Chapter  6  discusses  the  results  obtained  when  the  copula-based  detection  scheme  is  applied 
on  outdoor  data.  The  outdoor  data  was  provided  by  the  U.  S.  Army  Research  Laboratory  (ARL) 
and  was  collected  close  to  the  southwest  US  border.  The  chapter  discusses  the  results  obtained, 
for  footstep  detection,  when  seismic  data  was  fused  with  acoustic  data. 
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Chapter  7  summarizes  the  salient  concepts  explored  in  this  dissertation  and  examines  di¬ 
rections  for  future  research  for  copula-based  inference. 


18 


Chapter  2 

Indoor  Seismic  Data  -  Acquisition 
and  Analysis 


As  indicated  in  Chapter  1,  the  research  considered  in  this  dissertation  is  motivated  by  appli¬ 
cations  where,  due  to  various  considerations,  only  one-dimensional  signals  are  available,  such 
as  seismic  or  acoustic  signals.  In  order  to  obtain  a  dataset  that  is  representative  of  such  ap¬ 
plication  scenarios,  we  collected  seismic  data  by  deploying  an  array  of  geophone  sensors  in  a 
typical  indoor  office  environment.  The  performance  of  the  proposed  detectors,  in  Chapter  4 
and  Chapter  5,  will  be  evaluated  on  the  data  thus  collected.  This  chapter  discusses  the  physical 
characteristics  of  the  sensors  and  the  data  collection  hardware  in  Section  2.1,  and  a  description 
of  the  experiments  for  collecting  the  background  and  footstep  data  is  provided  in  Section  2.2. 

2.1  Sensor  description  and  setup 

Six  GS  20DX  geophones  were  used  for  the  experiments.  The  electrical  details  of  a  typical  sen¬ 
sor  [31]  are  depicted  in  Figure  2.1(b)  and  the  frequency  response  curve  is  depicted  in  Figure 
2.1(c).  Transduction  is  achieved  by  means  of  a  moving  coil  over  a  magnetic  core.  The  geo- 
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phones  are  designed  to  be  floor  mounted.  Floor  to  sensor  contact  was  achieved  by  means  of  a 
coupling  bolt  screwed  to  the  sensor,  which  was  held  to  the  floor  by  means  of  a  tripod  base.  This 
was  done  since  tight  coupling  was  not  feasible  as  it  requires  structural  penetration  by  means  of 
a  probe. 

The  sensors  constitute  a  wired  suite  and  are  connected  to  a  data-acquisition  system  (DAQ). 
The  DAQ  is  essentially  an  AD  converter  with  preset  amplification  and  low  level  software  for  de¬ 
vice  control.  The  DAQ  used  for  these  experiments  was  a  model  PDL-MF:  a  PCI  data-acqusition 
device  developed  by  United  Electronic  Industries,  Walpole,  MA.  The  data-acquisition  card  pro¬ 
vides  for  8  analog  input  channels  with  an  overall  sampling  rate  of  50,000  samples/s  and  16-bit 
quantization.  At  its  maximum  preset  amplification  factor  of  10  it  can  faithfully  (i.e.,  without 
clipping)  digitize  a  signal  of  amplitude  ±1V.  While  lower  amplifications  can  accommodate  a 
wider  range  of  signal  amplitude,  this  setting  was  selected  as  footsteps  generate  a  voltage  swing, 
in  each  sensor,  of  the  order  of  only  a  few  mV. 

The  DAQ  was  programmed,  using  a  C++  library  provided  by  the  manufacturer,  to  acquire 
data  at  5kHz.  The  DAQ  was  programmed  on  a  non-real-time  operating  system  (Microsoft 
Windows  XP  Professional)  and  therefore,  file  I/O  operations  for  a  5kHz  sampling  rate  is  a 
challenging  task.  Sequential  read- write  execution  leads  to  the  read  buffer  in  the  AD  converter 
getting  filled  up  before  the  acquired  data  can  be  written,  leading  to  data  loss.  This  problem 
exists  in  spite  of  the  manufacturer  providing  a  large  capacity  circular  buffer.  The  problem  was 
solved  by  programming  the  read  and  write  operations  to  execute  as  parallel  threads. 

2.2  Experiments 

The  six  sensors  were  configured  as  a  linear  array.  They  were  placed  along  the  long  edge  of 
a  hallway  (see  Figure  2.2).  Data  was  collected  in  two  (different)  building  hallways  of  similar 
construction.  The  sensors  were  placed  along  the  long  edge  of  the  hallway.  The  distance  be¬ 
tween  adjacent  sensors  was  maintained  at  5ft.  The  rationale  for  selecting  a  sampling  rate  of 
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(b) 


(c) 

Fig.  2.1:  The  GS  20DX  geophone,  (a)  Sensor  as  housed  and  packaged  (b)  Electrical  details: 
cable  length  and  sensor  polarity  (c)  Frequency  response  curve. 


21 


5kHz  was  so  that,  if  necessary,  high  frequency  information  in  the  footstep  data  [29]  could  be 
utilized  for  detection/classification.  However,  considering  the  frequency  response  curve  of  the 
GS  20DX  (see  Fig.  2.1(c))  and  the  typical  quasi-periodicity  of  footstep  signals,  the  raw  signal 
was  uniformly  down-sampled  to  1024  samples/second. 

Background  data  was  collected  by  leaving  the  sensors  in  an  isolated  environment.  Back¬ 
ground  data  is  approximately  of  a  4  minute  duration.  Multiple  persons  participated  in  the 
footstep  data  collection.  The  footstep  data  collected  consists  of  120  single-person  trials  (i.e., 
a  given  trial  has  exactly  one  participant  walking  along  the  hallway)  and  120  two-person  trials 
(a  given  trial  has  exactly  two  participants  walking  along  the  hallway).  Each  dataset  consists 
of  60  trials  from  Building  1  and  60  trials  from  Building  2.  The  approximate  duration  of  the 
data  collected  per  trial  is  12  seconds.  The  background  and  footstep  signal  from  a  single  person 
trial  are  graphed  in  Fig.  2.3  and  Fig.  2.4  respectively.  In  the  following  section,  some  analysis 
on  data  collected  is  presented,  the  analysis  focuses  on  the  presence  of  nonlinearity  and  heavy 
tailed  behavior  of  the  seismic  data. 


2.3  Preliminary  data  analysis 

In  this  section,  we  present  an  analysis  of  the  data.  The  nonlinear  nature  of  the  data  is  first  ex¬ 
plained.  Signal  nonlinearity,  within  the  footsteps  context,  is  strongly  suggestive  of  a  nonlinear 
mixing  medium  of  signal  propagation.  The  tail  behavior  of  the  data  is  analyzed  next,  and  we 
demonstrate  that  the  footstep  data  cannot  be  explained  by  a  Gaussian  or  exponential  tail-decay 
model. 

2.3.1  Nonlinearity  analysis  of  observed  data 


A  signal  y(t)  is  said  to  be  nonlinear  if  its  current  value  cannot  be  predicted  or  expressed  as 
a  linear  function  of  its  past  values.  Let  i  denote  the  sensor  index,  i.e.,  i  =  1,  2, . . . ,  6.  Each 
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Fig.  2.2:  Sensor  setup  in  one  of  the  buildings. 
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Fig.  2.3:  Time  series  of  the  background  signal. 
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Fig.  2.4:  Time  series  of  a  footstep  trial.  Nonstationarity  and  the  impulsive  nature  of  the  signal 
is  evident. 
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sensor  observation,  yl(t),  is  uniformly  down-sampled  to  1024  Hz.  Each  yr(t)  is  divided  into 
1  second  overlapping  frames.  Denote  by  %,  the  set  of  all  time  instants  contained  in  the  rth 
frame.  Therefore,  the  cardinality  of  %,  \%\,  is  1024.  The  inter-frame  overlap  was  set  to  50%. 

The  method  of  surrogate  data  [82]  is  used  to  analyze  the  acquired  seismic  time  series  for 
the  presence  of  nonlinearity.  The  null  hypothesis  states  that  the  original  time  series  is  a  real¬ 
ization  of  a  linear  Gaussian  process  (or  monotonic  transforms  thereof).  The  idea  is  to  generate 
a  set  of  time  series  (surrogate  data  set)  by  resampling  from  the  original  measurements  so  that 
linear  statistical  properties  of  the  original  data  are  preserved  in  the  surrogate  data  set.  These 
surrogates  are  then,  in  essence,  samples  from  a  population  consistent  with  the  null  hypothesis 
of  linearity  and  can  be  used  to  estimate  the  distribution  of  a  test  statistic  that  can  discrimi¬ 
nate  between  the  null  (linearity)  and  alternative  (nonlinearity).  This  statistic  is  computed  for 
both  the  surrogate  data  and  the  original  time  series.  If  the  statistic  computed  on  the  original 
time-series  lies  (significantly)  in  the  tail  of  the  distribution  of  the  statistic  corresponding  to  the 
surrogate,  the  null  is  rejected. 

Following  Schreiber  [82],  a  third  order  statistic, 

1  N 

</>rev  =  -  y^n  ~ x])3’  (2-!) 
n= 2 

is  used  to  test  for  nonlinearity  in  our  analysis.  Here  y[n]  is  the  sampled  version  of  y(t')  and  N 
is  the  number  of  samples  in  the  r-th  frame.  A  known  property  of  a  linear  Gaussian  process  is 
that  its  statistics  are  symmetric  under  time  reversal  [103];  <J>rev  measures  the  asymmetry  of  a 
series  under  time-reversal  [82]. 

Each  frame  of  yi(t)  (for  all  i)  is  tested  for  the  presence  of  nonlinearity  as  follows.  Forty 
surrogates,  Sk(t)  :  k  =  1,2, ...  ,40,  are  generated  from  a  given  frame  of  the  original  time 
series  yi{t)  using  the  iterative  amplitude  adjusted  Fourier  transform  (IAAFT)  [82].  The  IAAFT 
algorithm  is  based  on  the  amplitude  adjusted  Fourier  transform  (AAFT)  algorithm  [93],  which 
samples  from  a  normal  distribution  and  the  sampled  sequence  is  ranked  and  scaled  so  that  the 
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Fig.  2.5:  Test  for  nonlinearity.  Histogram  is  generated  using  the  surrogate  data.  The  statistic 
of  the  original  time  series  is  represented  by  the  solid  line  labeled  0^ev. 

amplitude  spectra  of  the  surrogates  matches  that  of  the  frame  under  test  while  randomizing  the 
phase  uniformly  between  0  and  lit.  Schreiber  notes  that  the  AAFT  algorithm  is  correct  only 
asymptotically,  it  generates  surrogates  that  are  linear  and  have  the  same  amplitude  probability 
distribution  (APD)  as  the  original  time-series  as  N  — >  oo.  The  IAAFT  algorithm,  proposed 
in  [83],  iterates  between  amplitude  adjustment  and  phase  randomization  until  the  surrogates 
and  the  original  data  have  the  same  APD.  The  statistic  in  Eq.  (2.1)  is  then  computed  for  both 
the  surrogates  and  the  test  data.  For  example,  consider  Fig.  2.5.  The  solid  line  indicates  the 
value  of  0yV,  the  statistic  computed  for  sensor  data  corresponding  to  a  frame  from  the  walking 
trials.  The  histogram  of  <jAv,  computed  for  the  corresponding  surrogates  is  also  shown.  It  is 
evident  that  the  footstep  signal  has  a  nonlinear  structure  to  it  as  <p^v  does  not  lie  within  the 
distribution  of  the  null  hypothesis  corresponding  to  linearity.  Thus,  the  null  hypothesis  can  be 
rejected. 

Each  frame  of  the  footstep  signal  is  tested  for  the  presence  of  nonlinearity  at  0.05  signifi¬ 
cance  level  (a  =  0.05)  using  a  rank  order  test  proposed  by  Theiler  et  al.  (see  Section  2.1,  [93]). 
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Table  2.1:  Percentage  of  frames  detected  as  nonlinear. 


Sensor 

i 

Footstep  data 

1  s  frame  2  s  frame 

1 

25 

20 

2 

19 

26 

3 

12 

14 

4 

11 

11 

5 

17 

20 

6 

19 

24 

The  case  when  the  frames  are  two  seconds  in  duration  is  also  considered  and  results  are  summa¬ 
rized  in  Table  2.1.  We  observe  that  a  significant  proportion  of  the  walking  frames  are  detected 
as  nonlinear.  However,  the  exact  nature  of  the  nonlinearity  is  not  known  and  is  difficult  to 
ascertain.  This,  is  also  a  different  characterization  of  the  nonstationarity  present  in  the  data, 
and  therefore,  motivates  the  use  of  semiparametric  methods  of  inference  with  such  data.  We 
also  compared  the  values  of  #ev  obtained  for  the  footstep  and  background  data.  The  standard 
tests  for  normality,  such  as  the  Jarque-Bera  test,  confirm  that  the  background  data  are  normally 
distributed.  After  standardizing  the  footstep  and  background  time-series  data,  values  of  0rev 
are  computed  over  Is  and  2s  frames.  In  Table  2.2,  ©rev.  the  mean  value  of  ©rev  over  the  total 
number  of  frames,  along  with  the  standard  error  (SE^ev)  are  shown  for  both  frame  durations. 
Similar  to  what  we  observe  in  Fig.  2.5,  the  numbers  reveal  that  for  a  linear  Gaussian  process, 
values  of  #ev  are  close  to  zero  with  narrow  standard  errors,  implying  that  the  values  of  0rev 
are  spread  about  a  narrow  interval  centered  about  0.  Values  of  ©rev  for  the  footstep  data,  on  the 
other  hand,  lie  significantly  outside  this  region  and  are  almost  an  order  of  magnitude  greater 
than  the  typical  ©rev  values  for  linear  Gaussian  processes.  However,  since  #ev  estimates  for 
footstep  data  also  possess  larger  standard  errors,  several  time-series  frames  are  classified  as 


“linear”  as  seen  in  Table  2.1. 
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Table  2.2:  Comparison  of  values  of  ©rev  with  [  S E0r,,  ]  for  Footstep  and  Background  data 


Sensor 

i 

Background 

Is  frame  2  s  frame 

Footstep 

1  s  frame  2  s  frame 

1 

-0.0050 
[12.67  -  10-5] 

-0.0035 
[5.4  •  10“5] 

-0.0185 
[6.1  •  10"4] 

-0.0115 
[3.5  •  10"4] 

2 

0.0005 
[1.6  •  10"5] 

0.0004 
[0.8  •  10“5] 

0.1557 
[4.6  •  10"3] 

0.0938 
[2.7  •  10“3] 

3 

«  -10~5 
[2.2  •  10“5] 

«  -10"5 
[1.3  •  10“5] 

0.0120 
[4.3  •  10"4] 

0.0139 
[2.9  •  10"4] 

4 

-0.0010 
[2.7  -  10~5] 

-0.0013 
[1.8  •  10“5] 

0.0205 
[7.2  •  10~4] 

0.0248 
[4.9  •  10"4] 

5 

-0.0003 
[2.3  •  10“5] 

-0.0003 
[1.3  •  10“5] 

0.0147 
[6.3  •  10"4] 

0.0176 
[4  •  10“4] 

6 

-0.0080 
[7.6  •  10"5] 

-0.0084 
[6.1  •  10“5] 

0.0056 
[5.2  •  10"4] 

0.0079 
[3.1  •  10"4] 

2.3.2  Tail  behavior  of  the  seismic  data 

We  have  also  analyzed  the  collected  seismic  data  for  the  tail  behavior.  The  background  data 
show  the  presence  of  exponential  tails,  and  the  footstep  data  show  the  presence  of  heavy  tails 
which  decay  at  a  polynomial  rate.  An  example  of  this  tail  behavior  is  shown  in  Fig.  2.6  and 
Fig.  2.7.  We  observe  that  not  only  do  the  tails  of  the  background  data  have  exponential  decay, 
but  they  do  this  at  a  slightly  sub-Gaussian  rate.  This  can  be  explained  considering  the  physical 
nature  of  the  geophone,  which  damps  sudden  (discontinuous)  excursions  of  the  signal.  An 
idea  of  this  behavior  can  also  be  inferred  from  the  frequency  plot  of  the  sensor  in  Fig.  2.1, 
which  shows  a  damped  response  in  the  high-frequency  regime.  On  the  other  hand,  the  heavy¬ 
tailed  behavior  of  the  footstep  data  is  clearly  visible  in  Fig.  2.7.  In  fact,  we  can  even  infer  a 
polynomial  decay  in  the  tails  of  the  footstep  data.  Note  that  for  any  distribution  decaying  at 
a  polynomial  rate  in  the  tails,  i.e.,  as  [a;!”0-1,  the  logarithm  of  this,  i.e.,  —  (cc  +  1)  log  \x\  will 


saturate  for  extreme  values  of  x.  This  is  precisely  the  behavior  we  observe  in  the  p.d.f.  plot 
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Fig.  2.6:  Probability  distribution  of  the  background  data  from  Sensor  5,  compared  to  the 
p.d.f.  of  normal  distribution  with  the  same  second-order  moments  as  the  background  data.  The 
V'-axis  is  plotted  on  a  logarithmic  scale. 


for  footstep  data,  where  the  F-axis  is  plotted  on  a  logarithmic  scale.  The  significance  of  a  tail 
decay  rate  of  x |  ~a~ 1  is  explained  in  Chapter  4. 


2.4  Other  datasets 


The  research  in  this  dissertation  took  place  in  collaboration  with  US  Army  Research  Laboratory 
(ARL).  In  this  effort,  we  also  collected  data  using  the  unattended  ground  sensor  (UGS)  suite  at 
ARL,  at  Adelphi,  MD.  Additional  data,  for  an  outdoor  scenario,  was  collected  by  ARL  near  the 


Probability  Density  Function 
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^5 

Fig.  2.7:  Probability  distribution  of  the  footstep  data  from  Sensor  5,  compared  to  the  p.d.f.  of 
normal  distribution  with  the  same  second-order  moments  as  the  footstep  data.  The  Y -axis  is 
plotted  on  a  logarithmic  scale. 
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southwest  US  border.  The  copula-based  methods,  proposed  in  Chapters  4  and  5,  have  also  been 
applied  to  these  datasets.  For  the  purpose  of  demonstrating  our  detection  methodology  on  real 
sensor  data,  in  this  dissertation,  we  focus  on  the  results  obtained  using  the  dataset  described 
in  this  chapter.  However,  we  have  applied  similar  methods  to  the  indoor  and  outdoor  ARL 
datasets  and  results  based  on  these  data  are  discussed  in  Chapter  6. 

2.5  Summary 

In  this  chapter  we  analyzed  the  nature  of  seismic  data  collected  using  geophone  sensors  in 
an  indoor  environment.  The  analysis  revealed  that  the  data  corresponding  to  footstep  activity 
exhibits  temporal  nonlinearity,  with  heavy-tailed  behavior.  The  background  data,  on  the  other 
hand,  are  approximately  normal.  Time  series  plots  of  the  footstep  data  also  reveal  that  the 
data  are  spatially  dependent,  but  signal  nonlinearity  will  imply  that  the  statistical  dependence 
exhibited  by  the  data  will  not  be  explainable  by  simple  models.  A  more  sophisticated  under¬ 
standing  of  statistical  dependence  is  required,  and  appropriate  models  must  be  used  for  any 
sort  of  inference  done  using  such  data.  While  we  have  demonstrated  the  existence  of  complex 
spatio-temporal  behavior  using  footstep  data,  such  signal  characteristics  can  be  seen  in  other 
types  of  data  too.  The  analyses  presented  in  this  chapter,  therefore,  motivate  our  research  ap¬ 
proach  in  this  dissertation.  We  address  the  general  theory  of  detecting  such  spatially  dependent 
heavy-tailed  data,  and  return  to  the  footstep  data  example  to  apply  our  proposed  methods. 
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Chapter  3 

Statistical  Dependence  and 
Copula  Theory 


Chapter  1  reviewed  the  recent  research  on  signal  processing  for  stochastically  dependent  obser¬ 
vations.  As  noted  in  Section  1.4,  parametric,  semi-parametric  and  non-parametric  techniques 
of  dependence  characterization  have  been  extensively  studied,  and  they  find  utility  in  a  variety 
of  applications.  As  a  consequence,  research  that  includes  the  consideration  of  dependence  in 
various  disciplines  such  as  machine  learning,  information  theory,  speech  processing,  finance, 
and  aerospace,  among  others,  has  led  to  a  rich  body  of  literature.  Dependence  modeling,  in  this 
dissertation,  is  based  on  copula  theory,  which  can  be  categorized  as  either  a  parametric  or  semi- 
parametric  approach  to  dependence  modeling,  depending  on  the  formulation  being  considered. 
In  this  chapter,  concepts  and  measures  of  dependence  are  discussed  (Section  3.1),  followed  by 
an  overview  of  copula  theory  (Section  3.2). 
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3.1  Bivariate  statistical  dependence 

The  topic  of  stochastic  dependence  has  been  studied  extensively  since  Karl  Pearson  first  de¬ 
fined  the  product-moment  correlation.  This  section  discusses  several  concepts  and  measures  of 
bivariate  dependence  that  have  since  sought  to  generalize  Pearson’s  correlation  coefficient.  The 
focus  on  bivariate  dependence  is  due  to  the  fact  that  many  concepts  of  multivariate  dependence 
do  not  carry  over  as  a  simple  extension  of  the  bivariate  case.  Further,  when  exploring  the  idea 
of  multivariate  dependence,  we  use  a  pairwise  scheme,  in  Chapter  5,  based  on  the  concept  of 
vines.  The  topics  covered  here  summarize  a  more  detailed  treatment  of  dependence  concepts 
by  Balakrishnan  and  Lai  (see  [7],  Chapter  3  and  Chapter  4).  The  discussion  that  follows  in 
the  next  section  will  show  how  a  copula-based  characterization  of  joint  distributions  relates  to 
these  generalized  descriptions  of  dependence. 

3.1.1  Positive  and  negative  dependence 

For  two  continuous  random  variables,  X  and  Y,  positive  dependence  implies  that  large/small 
values  of  Y  tend  to  accompany  large/small  values  of  X.  In  contrast,  negative  dependence  im¬ 
plies  that  large/small  values  of  Y  tend  to  accompany  small/large  values  of  X.  We  discuss  only 
concepts  that  are  derived  from  positive  dependence,  since  the  negative  dependence  counter¬ 
parts  are  analogous.  Further,  if  the  pair  (. X ,  Y)  has  a  positive  dependence,  then  (. X ,  —Y)  has 
negative  dependence  on  R2.  If  there  exists  a  constraint  of  positivity,  (. X ,  1  —  Y)  has  negative 
dependence  on  the  unit  square.  An  important  point  to  note  is  that  while  one  may  define  posi¬ 
tive  dependence  for  the  multivariate  case,  negative  dependence  is  no  more  a  mirror  reflection  of 
positive  dependence.  Six  basic  conditions  describing  positive  dependence  have  been  discussed 
in  the  literature  [55].  These  are  enumerated  below  in  the  increasing  order  of  stringency. 

1.  Positive  correlation.  Defined  for  positive  linear  correlation,  i.e.,  cov(  AR  Y)  >  0. 

2.  Positive  quadrant  dependence  (PQD).  P(A"  >  x,Y  >  y)  >  P(A  >  x)P(y  >  y),  or 
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equivalently,  P(X  <  x,  Y  <y)>  P(A"  <  x)P(l'  <  y ). 

3.  Association.  X  and  Y  are  said  to  be  positively  associated  if  for  every  pair  of  functions 
a  and  b  defined  on  R2  which  are  increasing  in  each  of  the  arguments  separately, 

co <v[a(X,Y),b(X,Y)]  >  0. 

Lai  and  Xie  note  that  a  direct  verification  of  association  is  difficult  [55].  It  is  often 
simpler  to  verify  one  or  more  of  the  conditions  to  follow,  which  are  more  stringent,  and 
thus,  imply  association. 

4.  Tail  dependence.  Y  is  right-tail  increasing  in  X,  denoted  as  RTI(V'|A"),  if  P(Y  > 
y\X  >  x )  increases  in  x  for  all  y.  Similarly,  Y  is  left-tail  decreasing  in  X,  written  as 
LTD(V'| A")  if  P(Y  <  y \ X  <  x)  decreases  in  x  for  all  y. 

5.  Stochastically  increasing  (SI).  Y  is  said  to  be  stochastically  increasing  in  x  for  all  y, 
S I ( Y |  A" ) ,  if  for  every  y,  P ( Y  >  y\X  =  x)  is  increasing  in  x.  SI(4A|  V')  can  be  defined  in 
a  similar  manner.  If  Y  is  SI  in  A",  E(Y| X  =  x)  is  also  increasing  in  x. 

6.  Total  positivity  of  order  2.  Let  X  and  Y  have  a  joint  density  f(x,  y).  Then  /  is  said  to 
be  totally  positive  of  order  2  (TP2)  if  for  all  x\  <  x2,  lj\  <  y2, 

f(xi,yi)f(x2,y2)  >  f(xi,y2)f(x2,yi) 

TP2  is  also  referred  to  as  X  and  Y  being  likelihood  ratio  dependent  (LRD). 

Since  these  conditions  were  listed  in  the  increasing  order  of  stringency,  (6)  (5)  (4) 

(3)  (2)  (1).  When  the  inequality  signs  of  the  relations  described  in  (1)  through  (6)  are 

reversed,  we  obtain  analogous  negative  dependence  concepts.  Specifically,  the  duals  of  (2), 

(4) ,  (5)  and  (6)  are  respectively  called  negative  quadrant  dependent,  right  tail  decreasing/left 
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tail  increasing  dependence,  stochastically  decreasing  dependence  and  reverse  regular  of  order 
2. 

3.1.2  Measures  of  dependence 

Measures  of  dependence  quantify,  in  some  particular  manner,  how  closely  the  variables  X  and 
Y  are  related.  Since  a  single  number  alone  cannot  completely  explain  the  nature  of  depen¬ 
dence,  a  variety  of  measures  are  defined  and  used.  The  following  list  is  not  comprehensive,  but 
represents  some  of  the  more  important  measures  of  dependence  that  have  been  proposed. 

1.  Pearson’s  correlation.  This  is  a  well  studied  measure  in  statistics  and  is  presented  here 
for  completeness.  Pearson’s  coefficient  of  correlation  is  given  by, 

_  cov(X,  Y) 
y/var(a:)  var(X) 

It  may  be  noted  that  p  measures  only  the  linear  dependence.  Furthermore,  there  exist 
well-known  examples  where  X  and  Y  are  dependent,  but  p  =  0.  For  example,  Melnick 
and  Tenenbein  [62]  have  analyzed  the  following  case.  Let  X  rsj  J\f( 0, 1)  and  define  Y 
such  that  for  A  >  0 

{X  if  \X\  <  X 

(3.1) 

-X  if  \X\  >  X 

We  can  verify  that  Y  ~  J\[( 0, 1),  since 

P (Y  <t)  =  P(|X|  <  A  A  X  <t)  +  P(|X|  >  A  A  -X  <  t)  (3.2) 

=  P(|X|  <  A  A  X  <  t)  +  P(|X|  >  A  A  X  <  t) 

=  P(X  <  t).  (3.3) 


where  (3.2)  follows  from  the  symmetry  of  J\f( 0, 1).  Denote  the  p.d.f.  of  X  as  fx  and 
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CDF  as  Fx.  The  correlation  coefficient  can  be  calculated  as 


p  =  E[XT]  =  2  /  x2 fx(x)dx  —  2  /  x2fx{x)dx 


=  4  /  x2fx(x)dx  —  1 


(3.4) 


Solving  for  A  by  setting  p  =  0  in  (3.4),  Melnick  and  Tenenbein  have  obtained  A  ~  1.54; 
for  this  value  of  A,  in  spite  of  X  and  Y  being  dependent,  p  =  0.  Note  that  X  and  Y  are 
not  jointly  normal,  i.e.,  fXY  is  not  a  bivariate  normal  p.d.f.,  and  hence  their  dependence 
structure  is  not  completely  explained  by  p. 


2.  Mutual  information.  Mutual  information  between  X,  Y  is  defined  as, 


i(X;Y)=  f  log(JfYX)dFM, 

Jr  2  \fx(x)fY(y)J 

and  it  measures  the  distance  between  the  joint  density  and  the  product  of  marginals, 
i.e.,  the  joint  density  if  X,  Y  were  independent.  Multiinformation  is  the  multivariate 
extension  of  mutual  information  proposed  by  Joe  [40].  For  the  vector  X  e  [Rn,  n  >  2, 


A  normalization  of  the  form  <5*  =  \Jl  —  exp(— 21)  ensures  that  mutual  information  and 
multiinformation  follow  Renyi’s  postulates  [77]  for  “an  appropriate  measure  of  depen¬ 
dence”.  In  particular,  5*  G  [0, 1], 

3.  Rank  correlations.  Rank  correlations  measure  the  dependence  between  rankings,  rather 
than  between  actual  values,  of  A"  and  Y .  Therefore,  rank  measures  are  unaffected  by  any 
increasing  transformation  of  X  and  Y,  while  p  is  unaffected  only  by  linear  transforma¬ 
tions.  Kendall’s  tau  (r)  and  Spearman’s  rho  (ps)  are  widely  used  measures  that  fall  in  this 
category.  For  independent  pairs  of  random  variables  (  A", ,  Ij  )  and  (X2,  Y2)  having  the 
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same  distribution  as  (. X ,  Y),  concordance  is  defined  as  the  condition  that  (X |  —  X2)  [Yx  — 
Y2)  >  0  and  discordance  is  defined  as  the  condition  that  [X\  —  X2)(Y\  —  Y2)  <  0. 
Kendall’s  tau  is  defined  to  be  the  difference  between  the  probabilities  of  concordance 
and  discordance: 

r  4  P[(Xx  -  X2){Y1  -  Y2)  >  0]  -  P[(Xx  -  X2)(Y1  -  Y2)  <  0], 

This  definition  is  equivalent  to, 

r  =  cov[sgn(Ai  -  X2),sgn(yi  -  Y2)]. 

Kendall’s  tau  is  also  a  measure  of  total  positivity:  r/2  represents  an  average  measure  of 
the  total  positivity  for  fxr,  the  joint  density  of  X  and  Y . 

Spearman’s  rho  is  defined  as  follows.  Let  (Aj,  Y] ) .  i  =  1, 2,  3  be  three  independent  pairs 
of  random  variables  with  a  common  distribution  function.  Then, 

PS  =  3  {P[(X!  -  X2){Yx  -  Y3)  >  0]  -  P[(Xx  -  X2)(Y,  -  y3)  <  0]}  . 

Spearman’s  rho  represents  an  average  measure  of  quadrant  dependence:  ps  >  0 
(X,  Y)  are  PQD. 

4.  Blomqvist’s  (3.  This  measure  evaluates  the  dependence  at  the  center  of  a  distribution, 
where  the  center  is  defined  by  (x,y),  the  medians  of  the  two  marginals.  Hence,  l3  is  also 
referred  to  as  the  medial  correlation  coefficient.  Blomqvist’s  (3  is  defined  as, 

f3  =  2P[(X  -  x)(Y  -  y)  >  0]  —  1  (3.5) 


5.  Local  measures  of  dependence.  Anscombe’s  quartet  refers  to  four  datasets  that  have 
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identical  coefficients  of  correlation  for  four  different  sets  of  (A",  Y)  data  pairs  [5].  All 
four  sets  of  data  also  have  identical  first  and  second  order  moments.  While  first  con¬ 
structed  to  demonstrate  the  importance  of  graphing  data  before  analyzing  it,  the  dataset 
also  reveals  the  global  nature  of  p,  i.e.,  it  is  defined  from  the  second  moment,  which 
is  in  turn  an  expectation  evaluated  over  the  entire  plane.  In  other  words,  while  global 
summary  statistics  are  useful  descriptors  of  the  data,  they  often  fall  short  of  providing 
a  complete  picture  about  the  true  variability  that  exists  in  the  data  set.  In  fact,  all  of 
the  above  measures  are  global  measures.  Pairs  (A",  Y)  and  (X' .  Y')  can  have  different 
distributions  and  yet  have  the  same  global  measure.  A  local  measure  of  dependence  will 
allow  one  to  compare  the  variation  of  dependence  between  the  two  pairs.  Several  local 
measures  of  dependence  have  been  proposed  in  the  literature,  mostly  as  an  extension  of 
global  dependence  measures.  Some  of  them  are  listed  below. 

•  Local  correlation  coefficient.  Let  p(x)  =  E(Y|A  =  x).  a'2(x)  =  var(Y| A  =  x ) 
and  (3(x )  =  f^ji(x).  The  local  correlation  coefficient  is  then  defined  as 

p(x)  = _ _ 

[(JxP(x)]2  +  <J2(x)’ 

where  ax  is  the  standard  deviation  of  X.  When  defined  in  this  manner,  p{x)  shares 
a  few  properties  with  its  global  counterpart:  it  takes  values  between  1  and  -1,  in¬ 
dependence  of  X  and  Y  implies  that  pix)  =  0  and  p(x)  =  ±1  for  almost  all  x 
is  equivalent  to  Y  being  a  function  of  A".  It  is  also  invariant  to  scaling,  but  is  not 
marginal  free.  The  latter  point  means  that  if  we  define  U  =  Fx(x)  and  V  =  FY(y ), 
the  resulting  p{u )  is  different  from  pix). 

•  Local  r  and  ps.  Local  measures  of  rank  correlation  exist,  and  are  evaluated  on  an 
open  neighborhood  about  a  point  of  interest,  (To,  yo ) •  The  functional  form  is  more 
easily  defined  using  copulas,  and  is  deferred  to  Section  3.2. 
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Dataset  1 


Dataset  2 


Dataset  3  Dataset  4 


Fig.  3.1:  Anscombe’s  quartet.  All  4  datasets  contain  identical  summary  statistics:  Mean  of 
Xi,  fiXi  =  9,  variance  of  X,t,  a2x.  =  11;  mean  of  Yt,  nYi  =  7.5,  variance  of  Yt,  af-  =  4.12; 
correlation  p  =  0.816  Vi  =  1,  2,  3, 4. 
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•  Local  measure  of  LRD.  An  index  that  can  be  used  to  measure  likelihood  ratio  de¬ 
pendence  (LRD)  locally  is  the  second  order  partial  derivative  of  the  logarithm  of 
the  density  function, 

d2 

T( x>y )  =  fadyl°gfxY(X,y’' 

Recall  that  saying  X  and  Y  are  LRD  is  synonymous  with  stating  that  fxy(x,  y )  is 
TP2.  It  can  be  shown  that  7 (x,y)  >  0 Vx,y  77  fxr{x,y )  is  TP2.  This  index  has 
several  attractive  properties;  significantly,  7(0;,  y)  =  0  if  and  only  if  A"  and  Y  are 
independent.  Furthermore,  7 (x,y)  is  marginal-free. 


3.2  Copula  theory 

Copulas,  typically  defined  as  cumulative  distribution  functions  (CDF),  are  parametric  func¬ 
tionals  that  associate  or  “couple”  disparate  univariate  marginal  distributions  to  a  multivariate 
distribution.  The  parametrization  quantifies  the  dependence  between  the  random  variables  over 
which  the  copula  is  defined.  The  dependence  parameter  is  not  explicitly  specified  in  this  sec¬ 
tion  and  is  introduced  in  Section  4.2.2,  as  it  is  more  relevant  in  the  context  of  inference.  Sklar’s 
theorem  is  an  important  result  and  specifies  the  framework  necessary  for  copula-based  infer¬ 
ence  [65].  Without  loss  of  generality,  the  random  variables  are  defined  over  [R  =  [—00, 00]. 

Theorem  3.1  (Sklar’s  Theorem).  A  cumulative  distribution  function,  Fz,  is  defined  over  the 
n-dimensional  random  vector  Z  =  [Z1}  Z2, . . . ,  Zn]T  for  which  the  corresponding  margined 
distribution  functions  are  Fz1,FZ2,.  . . ,  FZn.  There  exists  a  copula  C,  such  that  for  cdl  Z  e  R", 

Fz(z i)  •  •  • ,  zn)  =  C(FZl(zi), . . . ,  FZn(zn))  (3.6) 

If  FZi  is  continuous  for  1  <  i  <  n,  then  C  is  unique,  otherwise  it  is  determined  uniquely  on 
Ran Fz1  x  . . .  x  RanF^n  where  Ran Fzn  is  the  range  of  FZn.  Conversely,  given  a  copula  C  and 
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univariate  distributions  FZx , . . . ,  FZn,  Fz  as  defined  in  (3.6)  is  a  valid  multivariate  CDF  with 
marginals  FZl,...,FZn. 

Note  that  (3.6)  implies  that  the  copula  function  is  a  joint  distribution  of  uniformly  dis¬ 
tributed  random  variables.  As  a  direct  consequence  of  Sklar’s  Theorem,  for  continuous  distri¬ 
butions,  the  joint  p.d.f.  is  obtained  by  differentiating  (3.6), 


fz(z) 


HfzM) 

i=  1 


c(FZl(zi ),...,  FZm(zn)) 


(3.7) 


where  2  =  [zi, . . . ,  zn]T  and  c(-),  called  the  copula  density,  is  obtained  as  the  mixed  derivative 
of  C, 

O77, 

c(-)  =  ~ - — 7, — C(ui, . . . ,  un)  (3.8) 

OU\  •  •  •  oun 

where,  ul  =  Fz.(zi)  ~  U{ 0, 1).  Using  (3.7),  we  can  construct  a  joint  density  function  with 
specified  marginal  densities. 

Note  that  C(-)  is  a  valid  CDF  and  c(-)  is  a  valid  p.d.f.  for  uniformly  distributed  random 
variables,  ut.  Many  different  types  of  signals  have  well-understood  marginal  sensor  models, 
established  either  through  physics-based  theory  or  direct  empirical  evidence.  An  application 
specific  understanding  of  dependence,  however,  is  more  difficult.  Various  families  of  copula 
functions,  describing  different  types  of  dependence,  have  been  proposed  in  the  literature  [65]. 
However,  which  copula  function  should  be  used  for  a  given  case  is  not  very  clear  as  different 
copula  functions  may  characterize  different  types  of  dependence  behavior  among  the  random 
variables  [60].  A  brief  summary  of  some  popularly  used  copula  functions  is  discussed  next.  In 
the  following  discussion,  for  notational  brevity,  we  denote  the  n-tuple  (// 1 , . . . ,  un)  as  u. 
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3.2.1  Summary  of  some  copula  functions 

Copulas  derived  from  distributions 

Multivariate  distribution  functions  specify  dependence  structures  and  copula  functions  can  be 
derived  from  them.  Two  such  copula  functions  are  the  Gaussian  and  the  t  copula  functions  that 
are  derived  from  multivariate  Gaussian  and  Student-/  distributions  respectively.  Both  specify 
dependence  using  the  correlation  matrix  and  are  given  as  follows. 

The  Gaussian  copula  is  defined  as 


cv(w;£)  =  i^(iJV1K)>--->-FV1(wn);s)>  (3-9) 


where,  TV(-;  E)  denotes  the  multivariate  normal  CDF  with  correlation  matrix  E  and  Fff  de¬ 
notes  the  inverse  CDF  of  the  standard  normal.  The  corresponding  copula  density  function 
is 


ca/-(«;  s)  = 


n/PI 


exp 


4wT(E-I)w 


(3.10) 


where  u>  =  [uti, . . . ,  cc?, . . .  cun]T  with  a;,  =  F;V3  (u,  )  and  I  is  the  identity  matrix. 
Similarly,  the  /-copula  is  defined  as 


Ct(u;  S,  v)  =  tvp{tv (wi), . .  •  ,/„  M)  =  /„, s(6,  •  •  ^in) 


(3.11) 


where,  tvp  is  the  multivariate  Student-/  distribution  with  correlation  matrix  E  and  v  degrees 
of  freedom  and  tu  denotes  the  univariate  Student-/  distribution  with  v  degrees  of  freedom.  As 
v  — y  oo,  the  /  copula  approaches  the  Gaussian  copula.  Let  £  denote  the  column  vector  of 
V  i,  1  <  i  <  n.  The  density  function  for  the  /-copula  is  given  by 

r(  (y  +  n)/2  )r(i//2)"-‘  (l  +  i/-1;TS-1^)~(,/+"l/'2 

FwFF+Fnf  n”.i  a + 


ct{u ;  E,  v) 


(3.12) 
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Both  the  Gaussian  and  the  t  copula  functions  belong  to  the  elliptical  family  of  copulas. 


Archimedean  copulas 

Archimedean  copulas,  describing  an  n-variate  CDF,  are  defined  as  follows, 


C(u;<P)  =  0/ 


(3.13) 


where,  0  :  (0, 1]  i-»  [0,  oo)  is  a  convex,  strictly  decreasing  function  with  a  positive  second 
derivative  with  0(1)  =  0.  This  function  0  is  referred  to  as  the  generator  function  and  0  is  the 
copula  parameter  specifying  dependence.  The  inverse  for  the  generator  is  defined  as, 


{■&  x(s)  for  0  <  s  <  0(0) 

(3.14) 

0  for  0(0)  <  s  <  oo 


While  for  statistical  inference,  the  copula  density  is  more  useful,  it  is  more  difficult  to 
derive  a  usable  expression  for  every  Archimedean  copula.  Using  (3.8),  we  can  write 

\  1=1 

where  the  superscript  (n)  refers  to  the  n-th  order  partial  derivative  over  0<P(ut).  For  a  bivariate 
Archimedean  copula  this  resolves  to 


c{u](j))  =  (V)H 


d 


II 


i=  1 


(3.15) 


c(ui,u2) 


K(C(«i,«2))]3 


(3.16) 


The  Clayton,  Frank  and  Gumbel  copulas  are  commonly  used  examples  of  the  Archimedean 
copula  family  and  are  defined  next. 
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Clayton  copula 

The  generator  function  for  the  Clayton  copula  is 


=  t  0  -  !)  0  e  [—1,  oo)\{0} 


(3.17) 


and,  therefore,  the  copula  CDF  is  given  by 


Ca(u-,<j>)  =  (  ^  ^  0  -n  +  1 )  ,  4>  6  [—1,  oo)\{0} 


,  i—  1 


and  the  copula  density  function  can  be  obtained  upon  differentiation  as 


(3.18) 


r  (  t  +  n) 

ca(u;  0)  =  0n  ^  ^  (  JJ  ~  n  +  1 


r  4 


\i= 1 


.  i=l 


(3.19) 


Frank  copula 


The  Frank  copula  uses  the  generator  function 


(3.20) 


which  leads  to  the  associated  copula  CDF 


cFr(t,;  ^ = -f  log  (i + nk  'cxp{~4Ui}  - 11 

<4  V  exp{-0}  -  1 


0  G  tR\{0}.  (3.21) 


The  n-variate  copula  density  is  difficult  to  derive.  Archimedean  copulas  are  more  useful  in 
their  bivariate  form,  and  as  we  will  see  in  Chapter  5,  construction  of  multivariate  copulas, 
using  bivariate  elements,  leads  to  a  better  model,  in  general.  The  bivariate  copula  density  is 
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given  by  setting  n  =  2  and  twice  differentiating  the  copula  CDF  in  (3.21).  Therefore, 


CFr(u1,U2](p) 


_ 0(1  -  exp{— 0})  exp{— 0(^1  +  u2 )} _ 

[1  -  exp{— 0}  -  (1  -  exp{— 0«i})(l  -  exp{— 0m2})]2 


(3.22) 


Gumbel  copula 
The  function 

^4>(u)  =  (-  !°gw)0,  0  e  [1,  oo) 

generates  the  Gumbel  copula  CDF 


Cgu  (w;  0)  =  exp 


Z)(-ln 


,  i=l 


(3.23) 


(3.24) 


The  for  n  —  2,  we  obtain  the  corresponding  bivariate  copula  density  function 


CGu(Wl,  U2 ;  0) 


[(-log  «l)0  +  (-log«2)0]  2<yl  ^  [(log  Ml)  (log  Ma)]^  1 

U\li2 

X  (1  +  (0-  l)[(-logMi)</’  +  (-logMi)^"^} 


(3.25) 


In  addition  to  these  copulas,  we  also  note  that  independence  is  also  a  valid  Archimedean  copula, 
with  —  log  u  as  the  generator  function. 


3.2.2  Copulas  and  measures  of  dependence 

For  a  joint  bivariate  CDF  expressed  as  a  copula,  some  interesting  observations  can  be  made 
about  the  various  measures  of  dependence  introduced  in  Section  3.1.2.  For  the  random  pair 
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(A",  Y),  Kendall’s  r  and  Spearman’s  p  are  respectively  expressed  as  the  following  expectations: 


r  =  4  E[C(Fx(x),  FY(y))\  —  1  (3.26) 

Ps  —  12  E[Fx(x)FY(y)]  —  3  (3.27) 


For  the  case  of  elliptical  copulas,  parametrized  by  the  matrix  E  =  \ps(i,j)\. 


Pv(hj) 


(3.28) 


where  Tt)]  is  the  Kendall’s  r  evaluated  for  the  pair  (U,.  Uj ). 

Blomqvist’s  fd  defined  in  Eq.  (3.5)  can  be  expressed  in  terms  of  the  bivariate  copula,  C,  for 
the  pair  (X,  Y)  as, 


P  =  4 FXY(x,y)  -  1 


(3.29) 


Nelsen  [65]  notes  that  although  (5  depends  only  on  the  value  of  the  copula  at  the  center  of 
[0, 1]  x  [0, 1],  it  can  provide  good  approximations  of  r  and  ps  using,  e.g.,  a  Maclaurin  series 
expansion. 

Local  measures  of  dependence  discussed  earlier  also  reveal  interesting  properties  when 
expressed  in  terms  of  a  copula.  When  the  expectation  is  restricted  to  an  open  neighborhood 
V (x0,  yo)  local  forms  of  r  and  ps  are  defined  as, 


r(x0,y0)  =  4  JJ  C(u,v)dudv  —  1  (3.30) 

V{x0,yo) 

Ps{xo,yo)  =  12  JJ  (C(u,v)  —  uv)dudv  (3.31) 

V(xo,yo ) 

In  Section  3.1.2,  a  local  measure  of  likelihood  ratio  dependence  (LRD)  was  defined  as 


d2 

7  (x,y)  =  -K-z- fxY{x,y ) 
oxoy 
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and  it  was  noted  that  this  measure  is  marginal  free.  Consequently,  7(2;,  y)  equals  7 (u,  v),  where 


d2 

7(«»  v)  =  7^7^c(«,  Fx{x)  =  u,  FY{y)  =  v, 


(3.32) 


and  c(u,  v)  is  the  copula  density  function  for  copula  C. 


3.2.3  Tail  dependence  coefficients  as  a  measure  of  extremal  depen¬ 
dence 

Extremal  dependence  is  the  characterization  of  statistical  co-movement  for  extreme  values  of 
multivariate  data.  In  the  context  of  bivariate  data,  tail  dependence  coefficients  are  a  natural 
measure  of  extremal  dependence.  Two  measures,  the  upper  and  lower  tail  dependence  coeffi¬ 
cients,  have  been  defined  in  the  literature,  and  they  measure  the  amount  of  dependence  in  the 
upper  and  lower  quadrant  tails  of  the  support  of  the  random  vector.  Let  [X,  Y]  be  a  vector  of 
continuous  random  variables  with  marginal  CDFs  F  and  G.  Let  C(F(X),  G(Y ))  be  a  bivariate 
copula  distribution  function.  Then, 


(3.33) 

(3.34) 

(3.35) 

(3.36) 


Using  these  relations,  one  can  show  that,  for  the  Gaussian  copula 

XL  =  Xu  =  2  lim  Fjy  (x^==)  = 

x—>—oo  \  yl  +  p J 


Xu  ±  lim  P(y  >  G-'iuMX  >  F-^u)) 

u/^l 

,  1-2  u  +  C(u,u) 

=  lim  — 


u/\ 


1  —  u 


\L  4  lim  P(y  <  G~\u)\X  <  F~\u )) 

u\0 

=  lim 

u\0  U 


0 
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The  /-copula  on  the  other  hand  exhibits  non-zero  upper  and  lower  tail  dependence,  i.e., 

xL  =  \u  =  2  tv+1 

where  tv+ 1  denotes  the  CDF  of  a  univariate  t  distribution  with  u+ 1  degrees  of  freedom.  Hence, 
for  large  values  of  p  and  small  values  of  v  the  /-copula  exhibits  strong  tail  dependence. 

3.3  Summary 

In  this  chapter,  we  have  seen  that  copulas  are  able  to  provide  a  complete  characterization  of  sta¬ 
tistical  dependence,  largely  because  of  their  functional  nature.  Additionally,  for  many  families, 
there  exists  a  one-to-one  relationship  between  the  copula  dependence  parameters  and  nonpara- 
metric  rank-based  measures  of  dependence,  such  as  Kendall’s  tau.  The  use  of  these  measures  in 
inference  leads  to  a  large  savings  in  computational  effort,  as  compared  to  optimal  approaches 
such  as  maximum  likelihood.  The  copula-based  approach  also  allows  us  to  characterize  ex¬ 
tremal  dependence  through  the  concept  of  tail-dependence.  Selecting  and  using  copulas  that 
possess  non-zero  tail-dependence  plays  an  important  role  in  inference  problems.  These  issues 
are  discussed  in  further  detail  in  the  next  chapter,  in  the  context  of  inference  using  heavy-tailed 


a-stable  sensor  models. 
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Chapter  4 

Detection  of  Dependent 
Heavy- tailed  Data 


In  this  chapter,  we  take  the  first  steps  at  formulating  and  deriving  the  theory  for  spatially  de¬ 
pendent  heavy-tailed  signals,  using  a  copula-based  approach  for  dependence  modeling.  When 
extreme  value  measurements  occur  at  a  significantly  greater  frequency  than  is  attributable  to 
distributions  that  decay  exponentially  in  the  tail,  often  polynomial  tail-decay  models  provide 
an  appropriate  fit.  These  models  can  accommodate  the  typical  “spiky”  signatures  in  the  signal 
measurements  and  such  data  are  often  said  to  b e  fat-tailed  or  heavy-tailed.  Examples  of  such 
data  are  seen  in  applications  such  as  climatology  [24],  finance  [99],  and  well-established  signal 
processing  applications  such  as  radar,  communications  and  image  processing  [3, 13,45].  The 
co-occurrence  of  such  (rare)  extreme-valued  data  is  sometimes  symptomatic  of  a  catastrophic 
event,  and  its  detection,  therefore,  needs  appropriate  modeling  tools. 

The  heavy-tailed  characteristics  in  these  applications  are  often  modeled  using  a  class  of 
functions  known  as  a- stable  distributions.  Excluding  the  Levy,  Cauchy  and  Gaussian  distri¬ 
butions,  the  a- stable  family  does  not  admit  a  closed-form  probability  density  function  (p.d.f.). 
They  are  instead  defined  using  characteristic  functions  [67,  81].  This  chapter  examines  the 
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problem  of  detection  of  spatially  dependent  a -stable  signals.  We  consider  a  setup  where  the 
data  coming  from  all  sensors  are  a- stable  distributed,  but  are  non-identically  distributed.  In 
this  sense,  the  sensors  are  heterogeneous.  As  discussed  in  Chapter  1,  the  cause  for  this  hetero¬ 
geneity  could  be  multifarious. 

4.1  Introduction 

The  a- stable  model  is  motivated  by  the  empirical  observation  that  several  non-Gaussian  phe¬ 
nomena  exhibit  a  power-law  decay  model  with  a  tail  of  the  type  |x|  a  1,  a  G  (0, 2);  a  is 
referred  to  as  the  tail-index.  Further,  Gnedenko  and  Kolmogorov  [32]  proved  a  generalized 
central  limit  theorem  (CLT)  for  random  variables  that  possess  this  power-law  tail  decay  prop¬ 
erty.  This  theorem  states  that  the  limiting  distribution  of  the  sum  of  power-law  heavy-tailed 
distributed  random  variables  tends  to  the  class  of  o-stable  distributions.  In  addition  to  a,  this 
class  of  distributions  has  three  additional  parameters  corresponding  to  location  (5),  scale  (7) 
and  skewness  (/ 3 ).  This  allows  for  flexible  modeling  of  various  types  of  non-Gaussian  data. 
If  (3  =  0,  one  obtains  an  important  special  case  called  the  symmetric  a- stable  distribution, 
often  denoted  as  SaS.  A  formal  definition  and  brief  introduction  to  the  theory  of  a-stablc 
distributions  is  presented  in  Section  4.2.1. 

Introductory  discussions,  from  a  signal  processing  perspective,  on  a-stablc  processes  have 
focused  on  independent  and  identically  distributed  (IID)  formulations  [6,  84].  Detection  in 
the  presence  of  IID  SaS  noise  was  investigated  using  Bayesian  and  Neyman-Pearson  ap¬ 
proaches  [95,96],  where  fractional  lower  order  moments  (FLOM)  were  used  to  estimate  un¬ 
known  parameters.  Kuruoglu  et  al.  [54]  have  used  a  mixture  of  Gaussian  approximation  for 
SaS  noise.  Swami  and  Sadler  [91]  used  higher  order  statistics  for  estimating  and  detecting  sig¬ 
nals  in  SaS  noise  with  unknown  parameters.  More  recently,  different  authors  have  explored 
the  use  of  a;-stable  models  in  distributed  detection  [73],  acoustic  tracking  [110],  anomaly 
detection  [85],  wireless  communications  [76,80]  and  biomedical  applications  [56].  An  ex- 
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tensive  bibliography  on  a -stable  distributions/processes  and  its  applications  is  maintained  by 
Nolan  [68], 

In  this  chapter,  we  consider  a  detection  problem  using  data  from  sensors  configured  in  a 
parallel  topology.  As  indicated  in  Chapter  1,  heterogeneous  sensors  observe  a  common  phe¬ 
nomenon.  Their  observations  may  be  made  over  an  arbitrary  domain  of  measurement.  For  ex¬ 
ample,  these  measurements  may  represent  a  time  series  (temporal  measurements),  a  sequence 
of  spectral  coefficients  (measurements  in  frequency  domain),  or  some  other  feature  vector.  In 
their  respective  measurement  domains,  sensor  observations  are  modeled  as  IID  a- stable  ran¬ 
dom  variables  (e.g.,  temporally  independent  or  independent  spectral  coefficients).  The  sensor 
signal  model  is  kept  quite  general,  i.e.,  we  do  not  explicitly  specify  whether  the  phenomenon  of 
interest  is  embedded  in  IID  a- stable  noise  or  if  the  a- stable  model  characterizes  the  dynamics 
of  the  phenomenon  itself. 

Since  the  sensors  jointly  measure  the  same  process,  their  measurements  are  spatially  de¬ 
pendent  (i.e,  across  sensors).  We  use  copulas  to  model  this  dependence  (see  Chapter  3).  This 
o-stablc-copula  model  serves  as  the  focal  point  of  our  investigation  of  detection  of  dependent 
heavy-tailed  data.  The  generality  of  our  signal  model  and  the  copula-based  dependence  for¬ 
mulation  distinguishes  this  work  from  previous  works,  such  as  [73],  which  have  specifically 
considered  conditionally  independent  sensor  observations  embedded  in  a-stablc  noise. 

Multivariate  a-stablc  models  have  also  been  defined  and  used  for  inference  on  random 
vectors  with  heavy  tails  (e.g.,  see  [69,70,75,81]).  A  multivariate  a-stablc  model  generalizes 
the  univariate  a:-stable  law  and  is  defined  using  a  joint  characteristic  function.  Consequently, 
with  the  exception  of  a  few  applications1,  obtaining  the  resultant  a-stablc  marginal  densities 
is  not  computationally  tractable.  In  contrast,  the  copula  approach  allows  for  the  synthesis  of 
a  joint  distribution  based  on  pre-specified,  possibly  heterogeneous,  marginal  models.  Recall 
that  copulas  are  parametric  probability  distributions  that  couple  univariate  marginals  to  gener- 
1  see  Nolan  [68, 69]  and  references  cited  therein 
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ate  a  valid  joint  distribution  that  incorporates  statistical  dependence.  In  this  chapter,  we  utilize 
several  families  of  copula  functions,  which  can  characterize  non-linear  and  asymmetric  depen¬ 
dencies.  Copula-based  methods  of  inference  also  scale  well  across  multisensor  or  multidimen¬ 
sional  formulations.  This  should  be  contrasted  with  completely  nonparametric  formulations, 
such  as  learning-based  techniques,  which  are  known  to  suffer  from  scalability  issues  stemming 
from  the  curse  of  dimensionality. 

In  the  following  sections,  we  develop  the  idea  of  distributed  signal  detection,  using  a  copula 
based  characterization  of  dependence,  for  o-stablc  data.  Section  4.2  lays  out  the  canonical 
signal  model  for  spatially  dependent  a- stable  data.  The  detection  problem  is  formulated  in 
Section  4.3,  and  variations  of  the  likelihood  ratio  test  under  the  Neyman-Pearson  framework 
are  studied.  The  proposed  detection  schemes  are  applied  to  simulated  data  and  the  results  thus 
obtained  are  discussed  in  Section  4.4. 


4.2  Signal  Model 

We  consider  a  two-sensor  system,  where  each  sensor  transmits  its  analog  measurements  or  ob¬ 
servation  data  to  the  fusion  center  (FC).  The  two-sensor  restriction  is  without  loss  of  generality: 
the  theory  developed  in  this  chapter  (Propositions  4.1,  4.3  and  4.4)  readily  extends  to  multiple 
sensors  and,  significantly,  our  main  conclusions  do  not  depend  upon  the  number  of  sensors. 
The  two-sensor  formulation  allows  us  to  minimize  the  notational  complexity  in  the  exposition 
of  the  theory. 

Sensor  i  e  (1,  2}  transmits  {xij}^=1,  a  sequence  of  N  IID  measurements,  to  the  FC.  Each 
Xij  is  a  realization  of  the  random  variable  Xt,  where  j  indexes  the  measurement  domain,  which 
can  be  time,  frequency,  or  any  other  feature.  The  sensor  model  is  the  p.d.f.  fx.t ,  which  is 
characterized  using  a-stable  distributions  (Section  4.2.1). 

Denote  the  j-th  observation  pair  as  Xj  =  [xi j,X2j)J  where  [-]T  denotes  matrix/vector 
transpose.  In  general,  the  random  vector  X  =  [X\ .  X2\J ,  has  a  joint  density  fx{xj )  ^ 
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fXl{xij)  •  fx2(x2j),  i-C.,  sensor  observations  are  spatially  dependent.  This  inter-sensor  de¬ 
pendence  is  modeled  using  copulas  (Section  3.2).  For  x  =  {xj}^=l,  fx(x)  =  n 
i.e.,  sensor  data  are  independent  across  j.  It  is  not  necessary  that  fXl  =  fx2 ,  i.e.,  the  sensors 
are  heterogeneous.  The  o- stable  p.d.f.  fXi  is  also  referred  to  as  the  marginal  density  since 
we  can  obtain  each  sensor  model  by  marginalizing  fx;  a- stable  parameters  corresponding  to 
marginal  p.d.f.s  are  called  the  marginal  parameters. 

The  FC  uses  the  received  data  to  calculate  a  test-statistic,  which  is  compared  to  a  threshold. 
Under  the  Ney man-Pear son  framework,  the  threshold  is  chosen  such  that  the  probability  of 
detection,  PD,  is  maximized  under  a  constraint  on  PF,  the  probability  of  false  alarm. 

4.2.1  Stable  distributions 

We  model  X,  as  an  a- stable  random  variable.  An  ck- stable  distribution  (also  referred  to  simply 
as  a  stable  distribution),  does  not  necessarily  have  a  closed-form  p.d.f.  They  are  defined  in 
closed-form  by  their  characteristic  function  (CF), 

<PXi(t)  =  exp(-ya\t\aBa(t)  +  i  St)  (4.1) 

{[1  —  i/3  tan  (kol/2)  sgn(f)] ,  for  a  ^  1 

(4.2) 

[1  +  i/3(2/7r)  sgn(f)  log  \t\] ,  for  a  =  1 

where  i  =  \f~l,  a  6  (0,2],/)  6  [—1, 1],  7  >  0  and  6  e  IR.  The  parameters  a,  (3,  7  and  5  are, 
respectively,  the  tail-index,  location,  dispersion  and  skewness  parameters.  The  CF,  (px.(t),  and 
p.d.f.,  fXi(xij),  are  Fourier  transform  pairs. 

We  denote  the  distribution  of  Xi  as 


Xi~S{a,l3,  y,S). 


(4.3) 
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The  standard  form  refers  to  the  case  where  5  =  0  and  7  =  1  so  that,  7  ^Ci(Xr  —  5)  ~ 
<S(a,  /3, 1,0).  The  support  for  fXi  depends  on  the  values  of  a,  f3  and  5  [67], 


[5,oo) 


supp  [fxi]  =  < 


(-oo,5] 


for  a  <  1,  f3 
for  a  <  1,  f3 


1, 

-1, 


[R  otherwise. 


(4.4) 


Remarks 

Some  special  cases  and  properties  of  a -stable  distributions  are  as  follows: 

•  Closed-form  p.d.f  A  closed-form  p.d.f.  exists  for  three  special  cases:  Cauchy  (a  =  1, 
f3  =  0),  Levy  {a  =  0.5,  f3  =  1),  and  normal  (a  =  2)  distributions. 

•  Existence  of  moments .  The  m-th  order  moment  exists  only  if  m  G  (0,  a).  For  example, 
for  the  Cauchy  distribution  neither  mean  nor  variance  is  defined  since  a  =  1. 

•  Fractional  order  moments.  Analogous  to  Lp  norms  for  non-integer  values  of  p  <  2, 
typically  considered  in  robust  control,  the  p-th  order  fractional  moments  (see  [84])  of  an 
a- st able  random  variable  can  be  defined  as, 


E  ?  for  p  <  a. 

For  p  >  a,  E  ||AEj|p]  =  00.  These  p-th  order  moments  are  also  called  fractional  lower 
order  moments  (FLOM).  Parameter  estimation  based  on  FLOM  has  been  an  active  area 
of  research  [6]. 

•  Symmetry.  As  noted  in  Section  4.1 ,  /3  —  S  —  d  implies  a  symmetric  distribution  about 
the  median  d.  This  is  also  called  a  symmetric  a- stable  distribution,  denoted  as  SaS.  For 
the  SaS  case,  when  the  mean  is  not  admissible,  5  corresponds  to  the  median  [95]. 
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4.2.2  Dependent  stable  signals 

For  the  two  sensor  problem  under  consideration,  recall  that  X,  is  distributed  as  in  (4.3),  i.e., 

Xj  ~  5(q!i,  Pi,  71,  <5i)  and  X2  ~  S(a2,/3 2, 72,^2)-  Using  the  vector  notation 

*l>i  =  [«*,  Pi,  7 i,  ^]T,  *  =  1,2,.  (4.5) 

the  marginal  density  for  the  7 - th  observation  from  sensor  i  is  fx^ij ;  V’i)-  Consider  an  arbi¬ 
trary,  possibly  unknown  copula,  c,  parametrized  by  a  ('/-dimensional  column  vector,  0C.  The 
dimension,  d,  and  the  properties  of  0r  depend  on  the  definition  of  the  specific  copula,  c.  Denote 
the  probability  integral  transform  for  the  copula  argument  as, 

UijM  -  FXi(xij',  ''Pi),  *  =  !,  2  (4-6) 

Thus,  for  dependent  a-stable  signals,  (3.7)  can  be  rewritten  as, 

fx(x;  6)  =  nJLi  fxrixij]  ip1)fx2(x2j;  ip 2 ) 

x  c  (FXl (xy;  ^1),  Fx2(x 2j\ ip2)-,  (pc ) 

=  IlJLi  fx  1  (®y;  */7 )fx2(x2j;  tp2) 

x  c(ulj('ip1),u2j('ip2);(f)c)  (4.7) 


where  the  column  vector 

0  =  ['Ipl  ^2  (pcV 

is  contained  in  the  parameter  space,  0C,  defined  as  the  product  set  of  respective  component 
marginal  and  copula  parameter  spaces,  T,  and  <f>c.  That  is, 


0C  4  Vp1  X  4^2  X  <f>, 
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This  serves  as  the  canonical  signal  model  or  data  generating  process  (DGP),  which  is  further 
qualified  in  terms  of  the  null  and  alternative  hypotheses  for  the  detection  problem.  All  notations 
leading  to  the  DGP,  as  well  as  those  appearing  in  Section  4.3,  are  summarized  in  Table  4.1. 

4.3  The  detection  problem 

We  formulate  the  detection  problem  as  a  test  of  hypotheses 

Ho'-  ft  =  fx{xj-,0o)vs.H1:  &  =  fx(Xj]  0,),  (4.8) 

fx  7^  fx->  y  3  =  1,  2, . . . ,  N.  The  parameters  under  the  null  and  alternative  hypotheses  are, 
respectively, 

r  i T 

00  =  ijj10  20  (/>co  (4.9) 

and, 

r  i T 

0i  =  x/)u  ij>21  <t>ci  •  (4.10) 

In  (4.8),  we  assume  that  H0  is  completely  specified,  whereas  Hi  is  composite.  Such  a  formula¬ 
tion  is  frequently  encountered  in  applications  such  as  anomaly  detection,  where  the  “normal” 
operational  state  of  a  process  is  known  a  priori.  Specifically,  60  is  a  fixed  and  known  point 
in  the  parameter  space  ©Co,  defined  for  the  (known)  copula  cq.  0|  €  0Cl  is  deterministic  but 
unknown  such  that  the  distribution  parameters  as  well  as  the  copula  function  under  Hi,  ci,  are 
unknown.  The  space  ©Cl  is  not  defined  completely  since  ci  is  assumed  to  be  unknown.  There¬ 
fore,  the  formulation  in  (4.8)  leads  to  a  test  over  the  parameter  space,  as  well  as  the  space  of 
copula  functions. 

In  order  to  simplify  the  problem,  we  consider  a  copula  library,  C,  containing  candidate 
copulas,  defined  over  an  indexing  set,  M .  For  the  applications  discussed  in  Section  4.4.3,  we 
use  the  copulas  listed  in  Table  4.2.  These  are  among  the  most  commonly  applied  copulas  in  the 
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Table  4.1:  Symbols  and  Notations 


Notation 

Description 

J 

Measurement  index,  j  =  1,2,...,  iV 

xij 

j- th  observation  from  i-th  sensor,  i  =  1, 2 

xt 

a- stable  random  variable  corresponding  to  xt] 

*l>i  or  ik 

Vector  of  cc- stable  parameters  for  sensor  i  and  hypothesis  k  G 
{0, 1},  where  specified;  see  (4.5). 

Sensor  model  or  p.d.f.  of  Xt 

CDF  of  Xij\  also  see  (4.6) 

X ,  Xj,  x 

Random  vector  [X] ,  X2]T,  its  7 -th  sample  realization,  Xj  = 
[xij,  x2j]J,  and  the  sequence,  x  =  {xj}jLv 

0c 

Dependence  parameter  for  copula  c 

9  or  9k 

Joint  parameter  vector  [01?  02, 0C]T  or,  specifically, 

Vfi fc,  ^2 *,  0cfc]T  under 

fx 

Joint  p.d.f.  of  X  expressed  as  a  product  of  a-stable  marginals  fx1 , 
fx2  and  copula  c;  see  (4.7). 

fk 

■lx 

Joint  p.d.f.  under  hypothesis  k. 

C 

Copula  function  library;  see  Table  4.2 

c* 

Copula  selected  using  ML-based  copula  selection 

ii,  0c* 

ML  estimates  of  marginal  parameters  -0,;  and  copula  parameter  0C» 

Concatenated  vector  [01, 02,  0C*]T 

fx 

Joint  p.d.f.  obtained  using  c*  as  the  copula  with  ML  parameter 
estimates,  i.e.,  fxixp.  9  ) 

9k 

Pseudo-true  value,  evaluated  under  hypothesis  Hk,  when  fx  is  not 
the  data  generating  process 

fk 

Jx 

Joint  p.d.f.  evaluated  at  Gk,  i.e.,  fx{xf  9k) 
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Table  4.2:  Library  of  copula  functions 


Copulas 


Parametric  CDF 


Parameter  range 


Gaussian 


fg(fg\Xi),  . . . ,  FG\xm)-1j:), 


'p  —  r 1  p ' 

-  [p  ij> 


Student-t 

Clayton 

Frank 


tv,  S  (tj,  1  (xi tu\xm)), 
-1/0 


—  T  lo§ 


EIW-i 

^i+(nr=ie 


1) 


(e“*  -  1) 


pe  [-1,1] 

z/  >  3 

</>  e  [-i,oo)\{0} 

</>  e  K\{0} 


Gumbel 

Product 


exp  -(Er=i(-ln^); 


n 


i=l 


Mi 


</>  e  [l,  oo) 


Fg(x;  £):  multivariate  normal  CDF  with  mean  0  and  covariance  E; 

Fp 1  (x,):  inverse  univariate  normal  CDF  with  mean  0  and  variance  1; 
tv^\  multivariate  Student-t  CDF;  1 :  inverse  CDF  of  univariate  Student-t 


literature  [17].  Note  that  each  copula  is  defined  as  a  bivariate  CDF;  the  corresponding  density 
function  is  obtained  by  using  (3.8). 

Since  the  copula  corresponding  to  the  DGP  under  II  \  is  unknown,  a  best  fit  is  selected  from 
the  functions  contained  in  C.  Therefore,  the  hypothesis  testing  problem  implicitly  contains  a 
model  selection  component  in  the  formulation,  in  which  we  attempt  to  identify  the  “best”  cop¬ 
ula,  c*(  • ;  (/>**),  for  the  alternative  hypothesis,  where  {<:*(■:  (pc~ ) | <7C*  €  4c* }  E  C.  In  developing 
the  theory,  we  assume  that  the  “true”  copula  is  contained  in  C,  i.e.,  our  models  are  well  spec¬ 
ified.  The  effect  of  model  misspecification  in  the  context  of  copula-based  hypothesis  testing 
has  been  addressed  by  Iyengar  [36].  In  general,  the  selected  copula,  c*,  may  not  admit  any 
parameter  which  can  also  describe  the  copula  model  under  the  null  hypothesis,  c0(- ;  0Co).  That 
is,  there  may  not  exist  fic*  E  T,,*  such  that  c*(- ;  fic*)  =  c0 ( • ;  0C() ) .  Therefore,  our  formulation 
also  considers  the  more  general  case  of  testing  non-nested  hypotheses  [20,21,74, 100, 105]. 

The  hypothesis  testing  problem  is  solved  under  the  Ney man-Pear son  framework,  i.e.,  we 
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seek  to  design  tests  that  operate  under  a  false-alarm  constraint.  We  use  the  generalized  likeli¬ 
hood  ratio  test  (GLRT)  as  the  starting  point  and  investigate  its  properties.  The  GLR  test- statistic 
is  modified,  to  accommodate  uncertainty  about  c\,  by  also  maximizing  over  C,  so  that 

Tglr  =  log  fx(x;0*)  -  log/x(>;0o)  (4.11) 

'  N 

=  55 log  -fo  ?  1)  _  los  fx i  (xi j ;  ^io) 

-3= 1 
'  N 

+  51  log  fx2  (x2f,  $2)  -  log  fx2  (%2j ;  ^20) 

-J'= 1 
'  iV 

+  5IloSC*(Wli(?i)>W2j($2);?c*) 

-  i= 1 

-  logCo(Mij(^io),W2j(^2o);^co)  (4-12) 

N 

and,  ^  =  argsup  V  log/x^a*,;  p),  i  =  1,  2.  (4.13) 

Given  an  arbitrary  indexing  set,  Ai,  for  the  copula  library,  C,  c*  =  cm*  such  that  for  any 

cm  e  C,m  e  M 


m*  =  arg  max  j  £Cm  (<j>m)  \  Zrn  =  arg  sup  fCm  \ ,  (4. 14) 

N 

tcmitfim)  =  (4-15) 

i= 1 


In  (4.11),  0*  is  obtained  by  estimating  the  marginal  parameters,  independently,  prior  to 
obtaining  <t>m,  as  in  (4.14)  and  (4.15).  This  two-step  procedure  is  known  as  the  inference  for 
margins  (IFM)  method  [17]  and  is  different  from  estimating  the  marginal  and  copula  parame¬ 
ters  simultaneously.  It  follows  from  (4.14)  that  (j)c*  e  <1>C*,  in  (4.12),  is  the  maximum  likelihood 
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estimate  (MLE)  of  c*.  The  decision  rule  is 


Tglr  ^  Va,  (4.16) 

Ho 

where  rja  is  the  threshold  that  satisfies  the  constraint  PF  =  a.  Deriving  the  distribution  for 
Tglr  for  finite  N  is  very  difficult.  Asymptotic  distributions,  however,  may  be  derived.  In  the 
analysis  that  follows,  we  denote  fx(&j ;  0*)  as  fx  and  include  the  subscript  N  in  the  notation 
for  finite-sample  statistics  to  emphasize  dependence  on  sample  size,  as  necessary. 

4.3.1  Nested  hypotheses  or  nested  copula  models 

In  general,  for  arbitrary  hypotheses  H  and  K,  H  is  said  to  be  nested  in  K  if  it  is  possible  to 
derive  H  from  K  “either  by  means  of  an  exact  set  of  parametric  restrictions  or  as  a  result  of 
a  limiting  process”  [74].  To  define  nesting  in  a  more  precise  manner,  it  is  helpful  to  define  a 
model  as  a  set  of  p.d.f.s  indexed  over  admissible  parameter  values.  The  p.d.f.  form  of  a  nested 
model  [100]  is  stated  in  Definition  4.1.  Based  on  this,  we  formally  define  a  nested  copula 
model ,  which  allows  us  to  derive  asymptotic  results  for  our  formulation. 

Definition  4.1  (Nested  model).  For  a  continuous  random  vector  Z  G  Z  C  Rn,  given  two 
models,  J^o  =  {fo(z;  0O)  |  90  G  0O}  and  =  {fi(z;  0i)  |  0i  G  ©i},  where  /0  and  /i  are 
arbitrary  p.d.f.s  of  Z,  is  said  to  be  nested  in  if  and  only  if  C  V  z  G  Z. 

Definition  4.2  (Nested  copula  model).  A  copula  family  or  model  %)  =  {c0(u,v,  <f)0)  \  <po  G 
$0}  is  nested  in  copula  model  =  {ci(u,v,(j)i)  \  (j)\  G  $1}  if  and  only  if  % 0  C  for 
(u,  v )  G  [0,  l]2  almost  everywhere. 

Nested  copulas  and  nested  models  are  related  to  each  other  through  the  following  lemma. 


Lemma  4.1.  For  k  =  0,1,  arbitrary  continuous  p.d.f.s  j)-,  g\:,  h/.  and  copulas  eg.,  define  the 
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models 


&k  =  {fk(x-,^fk)  1  tl>fk  G  %J, 

iGXcK, 

(4.17) 

£4  =  1  ^9k  e 

y  G  Y  c  R, 

(4.18) 

=  (co(M,n;0fc)  I  4>k  G 

(u,v)  G  [0,  l]2, 

(4.19) 

^  =  { hk(x,y )  =  fk(x)gk(y)ck(Fk(x] 

)>  Gk{y)) 

(4.20) 

|  fk  £  &ki  9k  £  Cfc  G 

where,  X  andY  are  closed  with  xl  =  inf  X,  y/\  =  inf  Y,  Ffc(x)  =  fk(x')dx'  and  Gk(y)  = 
fkWW- 

A  joint  model  Mq  is  nested  in  M\  if  and  only  if  marginal  and  copula  models  are  both  nested, 
i.e.,  #o  is  nested  in  fF\,  %  is  nested  in  (.d\  and  %)  is  nested  in  rA\ . 

Proof.  We  need  to  prove  both  “if”  and  “only  if”  parts. 

Nested  marginal  models  =>•  nested  joint  model:  It  is  easy  to  see  that  if  J^0  C  #i,  %  C  (-d\ , 
and  %)  C  then  any  product  /0yoco  G  M'{)  is  also  contained  in  M\ ,  and  hence  C 

JTi  V  (x,  y)  G  X  x  Y. 

Converse:  For  k  —  0, 1,  we  first  define  the  CDF  models 

M’l  =  {Hk(x,y)  =  f  xXL  f*L  hk(x' ,  y')dx'dy'  \  hk  G  MQ, 

*6 k  =  {Ck(u,  v )  =  /“  fj  ck(u',  vfdu'dv'  I  ck  G  %}. 

Then,  since  C  V  (x,  y)  G  X  x  Y,  we  have  Jjff  C  Using  (3.6)  from  Theorem  3.1, 
{Uo(Fo(a;),Go(y);(/>o)|0o  e  <F0}  C  {C^Ffx),  G^y);  G  Since  (a)  X  and  Y  are 

closed,  and  ( b )  Fk,  Gk  are  continuous,  ^  C  (€[  V  (w,  n)  G  [0,  l]2.  Consequently  C  ^l,  i.e., 

Co  G  =$■  Cq  G 

That  #o  C  and  C  (d\  remains  to  be  proved.  We  prove  this  by  contradiction.  Three 
contradictory  cases  exist:  (a)  JF0  (f  and  %  C  c3\ ;  (/?)  C  and  %  ’t-  c^\  ;  and 
(c)  €-  and  ’t-  ^1-  We  provide  detailed  arguments  to  show  that  case  (c)  is  not 
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possible;  cases  (a)  and  ( b )  can  be  disproved  using  arguments  similar  to  those  presented  below. 

Assume  3  /  G  and  g  G  %  <t  so  that  h(x,y)  =  f(x)g(y)c0(F(x),G(y)), 

where  h(x,y )  G  Mo  C  ^  (i.e.,  h(x,y )  G  J^j).  Therefore,  following  (4.20)  for  k  =  1,  there 
must  exist  marginal  densities  /i  /  /,  ft  ^  g  such  that  /j  G  (d  G  Sfi  and  h(x,y)  = 
fi(x)gi(y)c0(Fi(x),Gi(y)).  Thus,  a  single  copula  c0  is  associated  with  the  same  joint  density 
h(x,  y)  for  distinct  marginal  pairs  {f(x),g(y)}  and  {fi(x),gi(y)}.  That  is, 

f{x)  f  fi(x)  contradicts  f(x)  =  fY  h(x,  y)dy  =  ffix)  and 
§(y)  ^  9i(y)  contradicts  g(y)  =  fx  h(x,  y)dx  =  <h(y ), 

implying,  that  $  /  G  f  g  G  %  <jt  ^j.  Hence,  Mo  c  =>-  M  C  Mi^o  C 
Sfi  and  ^C^V(x,i/)gXxY  □ 

Lemma  4.1  allows  us  to  state  and  prove  chi-square  convergence  in  distribution  for  TGlr 
through  Proposition  4.1. 

Proposition  4.1.  Suppose  that  the  hypothesis  testing  problem  in  (4.8)  is  specified  as  follows: 

(i)  The  joint  distribution,  fx,  and  parameter  space  under  II\  are  not  known  since  ci  is 
unknown.  The  p.d.fi  fx  and  corresponding  parameter  space,  are  determined  as  an 
outcome  of  the  copula  selection  process  in  (4.14). 

(ii)  X  is  well  specified  in  {fx}  U  {fx(xf,  0)\0  E  @c*}. 

(iii)  There  may  exist  marginal  parameters  which  are  fixed  and  the  same  for  both  hypothe¬ 
ses.  Denote  the  parameter  sub  space  containing  only  free  parameters  as  (-)'».  Then, 

dim(0c»)  —  dim ((-)(,»)  =  v  >  0. 

(iv)  The  copula  model  under  H0  is  nested  in  the  copula  model  selected  under  Hi.  That  is, 
for  known  fCo  such  that Mq  =  (co(rt,  v;  <f>Co)},  and  for  f?*  =  {c*(u,  v;  0C*)|0C*  G 
?0CfV(ji,D)  G  [0,1]2. 
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Additionally,  for  sensor  i,  and  hypothesis  k,  ifik  G  'I',/,.  C  4/, 


^  =  {[«,/3,7,<5]T  I  a  e  [e,l)  U  (l,2),e  >  0; 

|/3|  <  min(ct,  2  —  ct); 7  G  (0,  00);  <5  G  R}. 


(4.21) 


Then,  under  H0,  as  N  — >  00, 

2TglrAXJ-„  (4.22) 

where,  //  =  dim(0c*)  >  //  y2  -»  ls  a  chi-square  random  variable  with  [i  —  u  degrees  of 
freedom. 

Proof  To  prove  7 2  convergence,  as  seen  in  (4. 22), we  essentially  invoke  Wilks  theorem  (WT) 
[106].  However,  to  apply  WT,  we  first  need  to  prove  that  {a)  the  joint  null  model  M'{)  = 
{fx(xj',  #o)}  is  nested  in  the  joint  model  =  {fx{xf,  9)  \  0  E  @c»},  and  (b)  that  the  pa¬ 
rameter  estimates  obtained  using  the  IFM  method  are  asymptotically  normal.  The  marginal 
models  under  the  null  hypothesis  are  nested  for  each  i  =  1,2,  i.e.,  {fx%{xtp  Vh)  I  Vh  = 
•0io}  G  {fxl(xlj',f>i)  |  ifi  G  4/,}.  Since  both  marginals  and  copula  models  are  nested,  from 
Lemma  4.1,  C  3%\_. 

We  use  primes  to  denote  both,  free  parameters,  which  need  to  be  estimated,  and  their  cor¬ 
responding  subspaces.  Thus,  1//  denotes  the  vector  of  free  marginal  parameters,  for  the  ?-th 
sensor,  contained  in  the  subspace  4/'.  Given  (4.21), 


(4.23) 


where2  K4//J  is  the  corresponding  Fisher  information  matrix  evaluated  at  the  true  value, 
V7  [26].  For  marginal  estimates,  which  are  asymptotically  normal,  as  in  (4.23), 

Vn(9  1-0  (4.24) 


2 A f(m,  C)  denotes  a  normal  random  vector  with  mean  m,  covariance  C. 


64 


where  G(0'*)  is  the  Godambe  information  matrix  [41]  evaluated  at  the  true  value,  0[  .  Since 
C  M i,  and  0*  is  asymptotically  normal  WT  yields  (4.22).  □ 

Remark.  In  the  proof  of  Proposition  4.1,  the  restriction  on  \1 C,  as  given  in  (4.21),  is  used  to 
assert  the  asymptotic  normality  of  i/?j,  under  Hi.  We  additionally  require  that  (4.21)  be  true 
under  H0  to  ensure  that  the  two  hypotheses  are  nested. 

Proposition  4. 1  implies  that  we  can  express  the  probability  of  false  alarm  as  a  function  of 
the  threshold,  p,  using  the  y2  CDF,  Fx 2,  i.e., 

Pf(v)  =  1  -  Fx2(2V^-  ”)■ 

Thus,  for  PF  =  a,  the  detector  threshold  can  be  designed  as, 

T}a  =  0.5  F~2(l  —  a\  p  —  v).  (4.25) 

Using  (4.24),  under  the  conditions  listed  in  Proposition  4.1,  the  asymptotic  convergence  of 
the  test-statistic  under  H\  is  an  extension  of  Wald’s  result  [101], 

2TGLR  A  xl_u  ((«'.  -  6'„)Tg(6'c. M.  - ey) ,  (4.26) 

where  Xr(p)  denotes  a  non-central  chi-square  distribution  with  r  degrees  of  freedom,  and  non¬ 
centrality  parameter,  p.  The  vector  B'()  contains  the  parameters  under  H0;  it  is  a  point  in  the 
//  —  u  dimensional  subspace  0'Co  =  T)  x  x  $Co. 

4.3.2  Non-nested  models 

We  now  consider  the  case  when  is  not  nested  in  .  The  general  class  of  non-nested 
hypothesis  testing  problems  was  first  considered  by  Cox  [20,21].  Subsequently,  White  [105] 
analyzed  the  problem  to  establish  the  regularity  conditions  under  which  Cox’s  proposed  test 
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is  valid.  Vuong  [100]  has  generalized  this  work,  by  defining  precise  forms  of  model  nesting, 
and  deriving  a  hypothesis  testing  based  model  selection  scheme.  These  formulations  consider 
a  composite  null  hypothesis.  We,  however,  formulate  a  simple  null  hypothesis.  We  derive  the 
distribution  of  the  test- statistic  under  H0  and  observe  that  it  has  a  form  which  is  similar  to 
previously  derived  results  [21,74, 100, 105]. 

Non-nestedness  implies  that  the  p.d.f.  under  Hi  can  have  a  different  functional  form  as 
compared  to  the  p.d.f.  under  H0.  Consequently,  though  observations  may  be  generated  by 
M o,  likelihood  maximization,  in  Tglr,  is  carried  out  for  a  function  in  M\.  Hence,  asymptotic 
convergence  of  the  MLE  to  a  pseudo-true  value  needs  to  be  defined. 

Definition  4.3  (Pseudo-true  value  and  QMLE  [74]).  Suppose  a  DGP  =  { h(z ;  Gh)  \  Gh  = 
@ho  £  0 /, }  is  defined  over  the  random  vector  Z  €  Z  C  Rn.  The  pseudo-true  value  for  a 
model  =  { g(z ;  0g)  \  6g  €  ©9}  is  then  defined  as  that  value  of  6g  which  minimizes  the 
Kullback-Leibler  divergence  (KLD),  /7Kl-  between  h  and  g.  That  is, 

9g  =  arg  min  DKL  (h(z;  Oho)  ||  g(z;09)) 

e„  e@„ 

9  9  (4.27) 

=  arg  sup  (log  g(z;  0g)}  , 

where  Eft  is  the  expectation  under  h.  For  the  iV-sample  IID  sequence  the  quasi¬ 

maximum  likelihood  estimate  (QMLE)  is  the  sample  estimate, 

09,n  =  argsup  (1  /N)  ^gg{zg]  0g).  (4.28) 

<?9e©9 

Under  mild  regularity  conditions,  it  can  be  shown  that  0gtN  exists  and  is  a  strongly  consis¬ 
tent  estimator  of  6g,  i.e.,  0g  N  6 g  [104],  and,  therefore,  0g  N  A  Gg.  Using  the  consistency 

of  the  QMLE,  we  next  prove  that,  asymptotically,  the  model  selection  scheme  of  (4.14)  will 
select  the  true  copula. 
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Proposition  4.2.  If  {c(«i,  w2;  4>c)  I  4>c  G  fIy}  G  C  A  //icj  copula  model  corresponding  to  the 
DGP,  the  selection  process  in  (4.14)  will  select  the  true  copula,  c,  asymptotically  in  N. 

Proof.  Consider  an  arbitrary  indexing  set,  M,  for  the  copula  library,  C.  Suppose 

{cm(wi,w2;</>cJ  |  4>cm  e  ®cm}  e  C 

such  that  cm  f  c  for  some  m  G  Ai.  Recall  that  (4.14)  maximizes  cm(-;  <pCm )  over  <pCm  for 
every  cm  G  C,  and  selects  that  copula  which  has  the  maximum  likelihood  over  all  m  G  M. 
Also,  likelihood  maximization  does  not  change  under  normalization  by  N,  i.e.,  for  £c,n(<I>c)  = 
EJ=1  log  c(UljlU2f,<pc), 

fc,N  =  argsup  4,jv(0c)  =  argsupA^_14)JV(0c)- 

A  similar  observation  holds  for  cm.  Although  the  true  parameter  value  of  c  is  unknown  a  priori , 

- — .  p  ^  p  ~ 

it  exists  and  is  denoted  as  <f>'c.  Since  — >  4>'c  and  <fiCm,N  — >  0Cm,  we  can  write3: 


plim  Ejli[logc(-;0c,At)  -  logcm(-;0Cm,iv)]/-/V 

(4.29) 

iV-S-OO 

=  Ec [logical,  ^2;  (t>'c)/Cm(u  1,  ^2;  Jcj}] 

(4.30) 

A<l(c  ||  cm)  0 

(4.31) 

where  Ec  is  the  expectation  under  c  and  (4.30)  is  a  consequence  of  the  law  of  large  numbers.  In 
(4.31),  DKl(c  ||  cm)  denotes  the  KLD  between  c  and  cm;  the  inequality  is  strict  since  c  f  cm. 
From  (4.29)  and  (4.31), 


3  P 

If  Xn  — >  X  as  N  — >  oo  we  can  equivalently  write  X  =  plim  X jy. 
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The  above  arguments  hold  true  for  all  cm  €  C  distinct  from  c.  Therefore,  using  (4.14),  c  will 
be  selected  as  N  — >  oo  □ 

Proposition  4.2  is  significant  because  it  implies  that,  when  the  data  are  generated  under 
H0,  if  c0  is  contained  in  C,  copula  selection  will  always  (asymptotically)  select  c0  as  c*.  In 
effect,  there  are  very  specific  cases  when  TG LR,  under  H0,  is  evaluated  from  p.d.f.s  contained 
in  non-nested  models: 

1.  marginals  are  nested,  but  we  know  the  function  c\  a  priori ; 

2.  marginals  come  from  the  same  family  (a-stable  in  this  paper),  but  not  nested  because  the 
marginal  parameters  are  defined  over  disjoint  subspaces  over  the  hypotheses. 

For  the  first  case  (nested  a-stable  marginals  and  non-nested  copulas),  since  the  functional 
form  of  the  copula  under  H\  is  known,  C,  contains  only  c\,  and,  trivially,  c*  —  C\.  Hence,  there 
is  no  model  selection  component  to  the  detection  problem.  Irrespective  of  the  true  hypothesis, 
the  marginal  parameter  estimates,  will  converge  to  the  true  value,  'ip,  and,  therefore,  iil:]  (7/yJ 
will  be  asymptotically  uniform.  Thus,  <j)ci,  obtained  by  maximizing  c1(uij('ip1),U2j(ip2):>  0cj> 
will  converge  to  <pCl  and,  therefore,  we  can  use  the  IFM  method  for  copula  parameter  estima¬ 
tion. 

The  second  case,  with  non-nested  marginals,  will  occur  when  the  problem  is  defined  such 
that  H0  is  specified  for  a  marginal  parameter  that  cannot  be  achieved  by  likelihood  maximiza¬ 
tion  under  H\ .  This  includes  the  case  where  H0  is  defined  as  one  of  the  points  not  allowed 
for  a-stable  ML  estimation  (see  (4.21)),  e.g.,  when  dependent  Cauchy  distributed  marginals 
under  H0  (a*  =  1)  are  tested  against  the  composite  alternative  of  dependent,  stable  distributed 
marginals.  Alternatively,  there  may  exist  additional  knowledge  about  Hi  that  indicates  a  re¬ 
stricted  marginal  parameter  range;  xpt  is  then  a  range-restricted  MLE.  When  the  marginals  are 
not  nested,  ulJ{'ipi)  is  not  asymptotically  distributed  as  U( 0, 1),  and  the  IFM  estimate,  <pc*,  is 
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not  consistent.  In  this  case,  we  use  the  nonparametric  empirical  CDF  (ECDF), 


FXi(t)  =  (l/N)Z,Hxij<t}.  (4.32) 

In  (4.12),  the  estimate  iiij  =  FXi{xij)  is  used  in  place  of  Since  Fx%  A  FXj,  it  can 

be  shown  that  a  two-step  procedure,  using  iiij  instead  of  also  leads  to  a  consistent 

estimate  of  the  copula  parameter  [41]. 

When  the  data  are  generated  by  Hu  both,  and  utJ,  are  asymptotically  uniform, 

and  as  a  result  fc*  is  consistent.  Thus,  Proposition  4.1  and  (4.26)  hold  if  Uij  is  used  instead 
of  In  Proposition  4.3,  we  establish  the  asymptotic  distribution  of  TGlr  under  7/0  for 

non-nested  hypotheses. 

Proposition  4.3.  For  the  formulation  in  (4.8),  the  null  and  alternative  p.d.f.  models  are  Mi), 
M\ ,  respectively,  so  that  Mi  (jL  M\ .  Denote  the  pseudo-true  value  of  6  under  Hi  as  6\  so  that 
fx  =  fx(Xj]0 1).  Under  Hq, 

s/N  (tglr/N  +  Aa/"(0 ,w2),  where  (4.33) 

w2  =  E0[{log  fx/ fx}2}  -  (E0 [log  fx/ fx}}2  (4-34) 

and  E0  is  the  expectation  under  ff. 

Proof  Let  the  log-likelihood  under  H0  be  tN{0 0)  =  )  log  fxixp.  Of,  0,,  e  @co  c  EPo.  Un¬ 

der  (composite)  Hi  as  £n(0*)  =  log  fx(xj]  Of),  0*  e  @c*  C  RP1.  Expanding  iN(0 1)  = 
Ejiog  fxixj^f,  0i  G  RPl ,  about  0*,  using  the  mean- value  form  for  the  remainder  in  Tay¬ 
lor’s  theorem,  we  obtain 

MA)  =  M0*)  +  [V0MA)]T(A  -  0*)  +  1-{01  -  0*)T[V^iV(0)](01  -  Of),  (4.35) 
where  0  lies  on  the  segment  joining  0i  and  0*.  As  a  consequence  of  likelihood  maximiza- 
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tion,  Ve?N(0*)  =  0.  The  law  of  large  numbers  implies  that  (l/iV)V|fAr(0*)  A  E0[V|/^], 
where  the  expectation  is  under  H0  since  it  corresponds  to  the  hypothesis  under  which  the 
data  are  being  generated.  This  convergence  also  holds  for  the  likelihood  evaluated  at  0  as 
(0X  —  0*)  A  0  and,  thus,  the  error  term  ( 0X  —  0  )  converges  similarly.  Therefore,  we  can  write 
(l/N)V2eZN(d)  =  Eo[VAA  +  Yat,  where  YN  is  a  p1  x  px  matrix  such  that  each  element  is  a 
op(l)  random  variable4.  Let  Ax  =  E0[V|/A  and  01  N  =  0X  —  0*  so  that  (4.35)  becomes 

M0 1)  =  M#*)  +  (N/2)  6\  NT(A1  +  Yat)0)  ^  .  (4.36) 

v - v - ' 

=jjv 

where, 

JN  =  +  \(VNeiNyYN 

Also,  X«;  s  A  A'iO.A,  lJ, :  A  '].  11  =  1  tl[V„  4  ■  Vj/i]  (see  [104,  Theorem  3.2]). 
Each  element  of  (y/N 0\  n)  converges  in  distribution  to  a  normal  random  variable,  and  hence, 
is  bounded  in  probability5  (see  Theorem  2.3.2  in  [57]).  Lemma  2.3.1  and  Theorem  2.1.3  from 
[57]  prove  that6  (\ZlV9l^N)TYN(\/lV6ltN)  is  op(l).  Set  TN  =  r(0x)  —  £N(60).  Note  that 

Tglr,n  =  Eglr  =  ^/v(0*)  —  ^tv(^o)-  Thus, 

Tglr,n  =  Tn  —  (A/2)0e1>iVTAx0ex>iV  +  op(l)  (4.37) 

=*  ^N(TGlr,N/N  +  E0 [log  ft/fk]) 

=  -yfN(^TN/N  -  E0[log  fx/ fx]) 

-  (yfN6\/ Ax0e1;JV)/2  +  (l/V^V)op(l).  (4.38) 

Ax  does  not  depend  on  N  and  9\  N  is  op(l).  Consequently,  using  Lemma  2.3.1  from  [57], 
4 A  random  variable  Xn  is  op(T)v)  if  (Tv/A)  A  0  as  N  — >  oo. 

5Xn  is  said  to  be  bounded  in  probability  (denoted  Op(  1))  iff  for  every  e  >  0  3  Be  <  oo  and  Ne  such  that 
Pr[|Xjv|  <Be]>l-eViV>iVe. 

6The  lemma  and  two  theorems  are  informally  stated  as  follows  (see  [57]  for  more  details).  Theorem  2.3.2: 
Xn  A  X  =>  Xn  =  Op(  1);  Lemma  2.3.1:  op(l)  •  Op(  1)  =  op(l);  Theorem  2.1.3:  op(l)  +  op(l)  =  op(l). 
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v/iV0ijArTA10^Ar  is  oP(  1).  Therefore, 

V5v(  +  Aa(/i||/i) )  =  or{l)  -  '/N{(l/N)Y,j log  fx/fk  -  E„[log  fx/fk}}- 

The  second  term  in  the  RHS  contains  the  sample  mean  of  log  ff  /  and  its  expectation;  apply 
CLT  to  get  (4.34).  □ 

Although  w2  depends  on  the  pseudo-true  value  under  Hi,  it  can  be  consistently  estimated 
from  the  observations  as 


=  (l/A0(Ef=i  nLRj  -  Tg2lr},  (4.39) 

where  TGLRj  =  log  fx/fx  [100].  The  asymptotic  distribution  under  If  can  be  proved  in  a 
manner  similar  to  Proposition  4.3. 

Proposition  4.4.  For  the  hypothesis  testing  problem  in  Proposition  4.3,  under  Hi,  as  N  — >  oo, 
{Tglr/N  -  DKL(fk\\f°x)}/(™/^N)  ^  Af(0, 1)  (4.40) 


Proof.  We  proceed  along  similar  lines  as  Proposition  4.3,  but  by  replacing  E0  by  Ei,  i.e.,  the 
expectation  under  If .  Note  that  since  the  data  are  generated  under  If ,  we  use  6 1  instead  of 
0 1.  Denote  the  variance  of  TGlr  under  If  as 

w\  =  Ei  [(log  fx/fx)2]  ~  (Ei  [log  fx/fx])2  ■ 

The  limit  in  (4.40)  follows  as  w2  A  w\.  □ 

Equation  (4.34)  implies  that  determining  the  detector  threshold  requires  knowledge  of  the 
Kullback-Leibler  divergence  evaluated  at  the  (pseudo-)  true  values  of  the  distribution  parame- 
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ters  under  each  hypothesis.  Specifically,  for  PF  =  a, 

Va  =  VNwFj(  1  -  a)  -  ND^fxWfk), 

where  Fv- 1  is  the  inverse  CDF  of  the  standard  normal  distribution  and  w  is  obtained  from  data 
generated  under  H0.  Numerical  methods  may  be  employed  to  compute  D«l  for  a  two-sensor 
formulation;  however,  it  is  easy  to  see  that  this  is  not  scalable  for  a  multi-sensor  formulation.  In 
such  scenarios,  existing  data-driven  approaches  such  as  bootstrapping,  or  extreme-value  theory 
(EVT)  based  distribution  fitting  of  Tglr  under  //0  yield  reliable  approximations  of  r]a.  The 
latter  approach  based  on  EVT  is  especially  attractive  for  detection  with  a- stable  distributions. 
This  is  because  EVT  indicates  that  the  null  distribution  of  likelihood  ratio  test  statistics  is 
distributed  asymptotically  in  the  tails  as  a  generalized  Pareto  distribution.  This  behavior  would 
be  even  more  evident  for  stable-distributed  observations,  since  such  random  variables  exhibit 
Pareto  tails  [27].  Ozturk  et  al.  [72]  provide  a  detailed  treatment  of  estimating  detector  threshold 
based  on  EVT. 


4.4  Performance  evaluation 

In  this  section,  we  first  illustrate  the  performance  of  the  copula-based  GLRT  with  the  aid  of 
simulated  data  from  specifically  constructed  examples  (Section  4.4.1).  We  address  compu¬ 
tational  challenges  and  discuss  footstep  detection  using  the  seismic  sensor  data  described  in 
Chapter  2,  in  Section  4.4.2  and  Section  4.4.3,  respectively. 

4.4.1  Simulated  examples 

For  each  hypothesis,  50  x  105  pseudorandom  sample  pairs,  representing  dependent  a- stable 
sensor  measurements  Xj,  are  generated  using  MATLAB  [61].  Estimates  r/y,  0C*  and  utj ,  and 
the  test  statistic,  TGLR,  are  computed  for  every  distinct  group  of  Ar  =  50  samples.  PF  and  PD 
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values  are  evaluated  over  the  resultant  105  length  sequence  of  TGLR  values. 

We  use  Kendall’s  tau,  r,  to  quantify  the  dependence  for  a  given  copula,  c,  instead  of  directly 
specifying  (f>c.  This  allows  for  a  common  basis  for  comparison  across  all  examples  being 
considered.  A  rank-based  measure  of  dependence,  r  is  defined  as  the  difference  of  probabilities 
of  concordance  and  discordance.  For  the  copula  CDF,  C,  r  =  4  E [C(u,  v ;  c pc)\  —  1,  and  thus 
there  exists  a  one-to-one  relationship  between  r  and  0C  [65].  Analogous  to  p,  r  e  [—1, 1]: 
t  <  0  and  r  >  0  indicate  negative  and  positive  dependence,  respectively;  independence  implies 
that  r  =  0. 

For  the  three  examples  considered  below,  in  (4.41),  (4.43)  and  (4.44),  the  parameter  val¬ 
ues,  as  listed,  are  used  for  data  generation.  While  evaluating  the  performance  of  our  proposed 
approach,  boxed  terms  are  assumed  unknown  for  data  processing,  and  are  determined  by  esti¬ 
mation  or  model  selection  as  a  part  of  our  detection  methodology. 

Example  4.1  (Dependent  SaS  distributions). 


H0:  Xx  ~  5(1.3,  0,0.1,  0)  X2  ~  5(1.5,  0,0.1,  0) 

Co  =  cv(r  =  0.1) 

:  Xj  ~  5(1.3,  0,0.1,[M1)  X2  ~  5(1.5,  0, [02],  0) 


(4.41) 


Cl  = 


q(t  =  0.5,  v  =  3) 


Sensor  1  measures  shift  in  mean,  and  Sensor  2  measures  change  in  dispersion.  The  dependence 
under  the  null  hypothesis  is  symmetric  and  is  modeled  using  the  Gaussian  copula,  cj/.  The 
dependence  under  Hi  is  modeled  using  a  t-copula,  ct,  with  u  —  3  degrees  of  freedom  (DoF). 
Both  Gaussian  and  t  copulas  use  the  correlation  coefficient,  p,  as  the  dependence  parameter 
(see  Table  4.2).  Under  H0,  r  =  0.1  <3-  p  =  0.15  indicates  weaker  dependence  compared  to 
r  =  0.5  p  =  0.7  under  H L.  The  empirical  receiver  operating  characteristic  (ROC)  using 
Tglr  as  the  test-statistic  is  shown  as  the  solid  curve  in  Fig.  4.1.  Recall  that  TGlr,  in  (4.12),  uses 
the  copula  selection  procedure  in  (4.14).  Often,  one  is  tempted  to  assume  that  the  dependence 


Fig.  4.1:  ROC  for  Example  1:  dependent  SaS  distributions. 
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model  under  the  null  hypothesis  also  prevails  under  the  alternative  hypothesis.  This  assumption 
implies  that  instead  of  using  c*,  as  would  be  obtained  using  (4.14),  we  use  the  Gaussian  copula 
under  H\  so  that  the  test-statistic  is 

T'  =  E^iloS {fx{xj]  {51,%,4>cu}T)/fx{xj\00)}  (4.42) 


with  N  —  50.  While  the  Gaussian  copula  under  H0  does  not  capture  any  tail  dependence,  the 
f -copula  exhibits  both  lower  and  upper  tail  dependence.  Lower  DoF  values  indicate  heavier 
dependence  in  the  tails,  i.e.,  extreme  events  co-occur  with  greater  probability.  As  v  — >•  oo,  a  t- 
copula  converges  to  a  Gaussian  copula.  In  that  sense,  v  controls  the  amount  of  tail  dependence. 
As  a  consequence  of  mismodeling  the  copula,  the  tail  dependence  is  inadequately  characterized 
and  the  detector  using  T'  suffers  a  10%  decrease  in  Pd  for  Pj?  <  1 0  ~ :! .  The  ROC  for  T'  is  the 
dashed  curve  in  Fig.  4.1. 

Example  4.2  (Nearly  normal  distributions).  As  in  Example  1,  Sensor  1  and  Sensor  2  mea¬ 
sure  mean  and  dispersion,  respectively.  The  standard  normal  distribution  is  equivalent  to 
5(2,  0,  0.5,  0).  A  tail  index  of  1.9  comes  close  to  a  normal  distribution,  but  the  tail  still  de¬ 
cays  at  a  polynomial  rate.  The  problem  is  setup  as, 


H0  :  Xx  ~  5(1.9,  0,1,0)  X2  ~  5(1.9,  0, 1, 0) 
Co  =  ca/-(t  =  0.064) 


H i:  X1  ~  5(1.9,  0, 1, □J)  X2  ~  5(1.9,  0,|  1.5  |,0) 


(4.43) 


ci  =  ct(j  =  0.41,  v  =  15) 


The  values  of  r  under  H0  and  II \  correspond  (approximately)  to  correlation  coefficient 
values  of  0.1  and  0.6,  respectively.  At  u  —  15,  the  f -copula  exhibits  moderate  to  low  tail 
dependence.  Example  1  demonstrated  the  effect  of  assuming  the  null  dependence  model  in  the 
alternative  hypothesis.  A  more  egregious  assumption  is  one  of  joint  normality,  as  it  represents 
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a  case  where,  apart  from  the  copula,  the  marginal  model  is  also  mismatched  with  respect  to  the 
DGP.  If  Xi  ~  J\T(ni,  erf),  erf  =  2y f.  The  GLR  statistic,  T',  assumes  joint  normality  under  both 

2  0.2 

hypotheses.  For  H0  :  X  rsj  Co),  £i0  =  [0,  0]T  and  C0  =  I  I  .  Therefore, 

0.2  2 


N 

T  =  iVTog|C|  +  -  p,)TC~1(xj  -  /£)  -  xJC0Xj 

3= 1 


where  N  —  50,  C 


Fig.  4.2  compares  the  ROCs  using  Tglr  and  V .  The  severe  loss  in  Pp  is  clearly  evident  for 
the  latter  case. 


Example  4.3  (Skewed  and  asymmetrically  dependent  distributions).  The  problem  setup  is  as 
follows: 

H0  :  Xi  ~  5(1.5,  0,0.1, 0)  X2  ~  £(0,0.25) 
c0  —  1  (Independent) 

-  1 — 1  (4.44) 

H,  :  Xj  ^  5(1.5,  0,0.1, [Ml)  X2~£(O,0) 

Cl  =|  cGu(r  =  0.6)~ 

where  C(5, 7)  is  a  Levy  distribution  so  that,  C(5, 7)  =  5(0.5, 1, 7,  <5),  and  cGu  represents  a 
Gumbel  copula.  The  Levy  distribution  admits  a  closed-form  p.d.f.,  so  that  for  the  7 -th  obser¬ 
vation  under  hypothesis  k, 


Fig.  4.2:  ROC  for  Example  2:  nearly  normal  distributions. 
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Fig.  4.3:  ROC  for  Example  3:  skewed  and  asymmetrically  dependent  distributions, 
for  X2j  G  [5,  oo).  For  <5  =  0,  the  MLE  for  7  is, 

The  ROCs  in  Fig.  4.3  show  that,  for  a  given  PF  value,  the  detector  using  TGlr  outperforms 
the  detector  assuming  independence  under  Hi  (i.e.,  set  the  third  term  in  (4.12)  to  0). 

Fig.  4.4  shows  the  contour  plot  of  the  joint  density  under  Hi  from  Example  3  and  illustrates 
two  key  advantages  of  copula  based  modeling.  While  the  observations  from  Sensor  1  are 
supported  on  R,  the  observations  from  Sensor  2  are  supported  on  [0,  00).  Using  the  copula- 
based  approach  we  are  able  to  synthesize  a  valid  joint  p.d.f.  from  disparate  marginals,  which 
can  capture  the  dependence  in  the  tails.  Secondly,  skewed  and  nonlinear  dependence  is  also 
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Supp[S(1.5, 0,0.1, 0.1)] 


Fig.  4.4:  Contour  plot  for  from  Example  3.  The  X  and  Y  axes  are  the  supports  for  the 
marginal  densities  of  X\  and  X2,  respectively. 

adequately  modeled;  this  may  be  contrasted  with  the  typical  concentric  ellipses  observed  when 
using  symmetric  linear  dependence  models. 

4.4.2  Computational  considerations 

In  the  examples  discussed  above,  the  hypotheses  were  constructed  to  reflect  commonly  ob¬ 
served  scenarios  in  typical  signal  processing  applications.  For  these  cases,  only  the  displace¬ 
ment  or  scale  parameters  need  to  be  estimated.  In  many  applications,  however,  MLE  for  the 
complete  set  of  parameters  is  required  and  presents  a  significant  computational  burden. 

Other  than  the  number  of  parameters  to  be  estimated,  MLE  is  also  constrained  by  (4.21). 
As  an  example,  consider  the  case  where  a  random  variable  X  ~  <5(1.3, 1,  0.2,  0).  Let 
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represent  the  log-likelihood  of  X,  expressed  as  a  function  of  7  over  N  samples.  For  a  = 
1.3 ,/3  =  1,  \/3\  >  min(a;,2  —  a)  which  violates  (4.21).  This  affects  £x(l)  even  if  we  wish 
to  estimate  only  7:  £x( 7)  is  unbounded  and  £x{l)  — >  00  as  7  — >  0.  The  MLE  7^  will  not 
converge  to  7  =  0.2  as  IV  — >•  00. 

As  an  alternative,  iterative  estimators  such  as  the  Koutrouvelis  regression  estimator  (KRE) 
can  be  used  [50].  The  KRE  is  consistent  and  is  computationally  more  efficient  than  the  MLE. 
This  is  because  they  are  derived  from  the  characteristic  function  of  a  stable  distribution.  Fur¬ 
thermore,  for  certain  cases,  the  KR  estimates  are  also  asymptotically  normal  [51].  This  implies 
that  the  asymptotic  distributions  for  TG|_r,  derived  in  Section  4.3,  typically  hold  for  marginal  es¬ 
timates  obtained  using  KRE  as  well.  Perhaps  counter-intuitively,  when  using  skewed  marginal 
distributions,  we  have  observed  in  our  experiments  that  improved  detection  performance  may 
be  obtained  when  the  entire  parameter  set  is  estimated  using  KRE,  when  compared  to  using  the 
MLE.  This  behavior  may  be  observed  even  if  the  log-likelihood  is  maximized  over  a  smaller 
subspace  of  unknown  parameters. 

For  copula  parameters,  estimating  Kendall’s  tau  (r)  can  be  considered  as  a  computationally 
efficient  alternative  to  the  MLE.  Kendall’s  tau  can  be  estimated  non-parame  trie  ally.  The  esti¬ 
mates  are  consistent  and  since,  for  each  copula  in  Table  4.2,  there  exists  an  invertible  function 
£c  such  that  r  =  £c(</>c),  we  can  obtain  a  copula  parameter  estimate,  <j)c  =  £7'  (t  ),  where 

f=(Nc~  Nd)/(NC  +  Nd)  =  (Nc  -  Nd)/  (*) 

for  Nc  concordant  pairs  and  Nd  discordant  observation  pairs7. 

In  the  following  subsection,  the  KRE  and  Kendall’s  tau  based  estimators  are  used  to  obtain 
parameter  estimates  for  the  log-likelihood  ratios  for  footstep  data  collected  in  indoor  environ¬ 
ments.  A  footstep  detection  problem  against  a  null  “background”  hypothesis  is  formulated. 

7 Observation  pairs  (x\j,  a*2?)  and  are  concordant  if  (x y  —  Xij>)(yij  —  y\j>)  >  0,  and  discordant 

if  {x\ j  -  xiy^yij  -  y\j>)  <  0  [65]. 
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Under  the  alternative  “footstep”  hypothesis,  inter-sensor  dependence  is  modeled  using  copu¬ 
las.  For  the  footstep  detection  problem,  an  additional  element  of  uncertainty  is  that  the  true 
data  generating  copula  is  unknown  and  may  not  be  contained  in  the  copula  library.  The  copula 
selection  process,  therefore,  does  not  guarantee,  even  asymptotically,  that  the  true  copula  will 
be  chosen.  However,  the  best  model,  in  the  KL  divergence  sense,  will  be  selected.  This  is 
theoretically  consistent  with  minimum  description  length  (MDL)  based  approaches  to  model 
selection,  especially  for  single  parameter  models.  The  CDF  arguments  for  the  copula  under  the 
alternative  model  is  obtained  using  the  empirical  CDF  in  (4.32). 

4.4.3  Footstep  Detection 

The  preceding  discussion  considered  simulated  data.  The  remainder  of  this  section  discusses 
results  obtained  on  applying  the  proposed  detection  scheme  to  the  indoor  seismic  data  de¬ 
scribed  in  Chapter  2.  In  this  section,  we,  restrict  our  attention  to  the  background  and  walking 
trials  obtained  from  the  two  sensors  at  the  “center”  of  the  array. 

Recall  that  each  “walking”  trial  lasted  approximately  12  s  and  the  duration  of  the  back¬ 
ground  data  is  30  s.  For  the  analysis  in  this  section,  the  background  data  was  split  into  a  pilot 
(training)  set  and  a  test  set.  Each  of  the  background  and  footstep  datasets,  were  split  into  1500 
non-overlapping  windows  with  500  samples  per  window.  The  average  walking  pace  is  mea¬ 
sured  to  be  approximately  2  steps  per  second;  therefore,  for  a  1  kHz  sampling  rate,  a  window 
length  of  500  samples  allows  us  to  capture  the  dynamics  of  a  footfall  within  one  observation 
window.  The  data  points  in  each  window  are  used  for  parameter  estimation  and  calculating 
a  corresponding  test-statistic.  Consequently,  1500  decision  windows  are  available  over  which 
Pf  and  Pd  evaluations  are  made. 
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Data  analysis  for  a-stable  characterization 


The  background  and  footstep  data,  acquired  from  the  geophone  sensors,  were  analyzed  in 
Chapter  2,  in  which  we  noted  the  significantly  heavy-tailed  nature  of  footstep  data.  In  the 
following  discussion  we  analyze  the  a-stable  characterization  of  this  data. 

Specifically,  we  analyzed  the  a-stable  fit  for  footstep  data  to  contrast  its  behavior  with  re¬ 
spect  to  the  background  (null  hypothesis)  data.  Table  4.3  lists  the  mean,  median  and  standard 
error,  median  absolute  deviation  (MAD)  statistics  for  the  fitted  a-stable  parameters.  These 
statistics  are  also  shown  for  the  Kendall’s  tau  estimate  of  the  data.The  mean  and  median  val¬ 
ues  are  identical  to  the  second  decimal  place  for  the  a-stable  parameters,  especially  for  the 
background.  A  minor  discrepancy  exists  between  the  mean  and  median  values  for  r,  but  r  for 
background  data  is  rather  small,  and  this  difference  does  not  affect  our  analysis  and  results.  In 
the  ensuing  implementation  of  the  detection  algorithm,  if  there  is  a  discrepancy  between  mean 
and  median  values,  we  use  the  median  estimates  as  they  are  more  robust.  While  r  values  appear 
small  for  the  footstep  data,  this  must  be  taken  in  context  with  two  points:  (a)  the  Kendall’s  tau 
for  the  footstep  data  is  an  order  of  magnitude  greater  than  the  r  estimates  for  background,  and 
(b)  the  r  estimates  are  affected  by  the  interstitial  periods  of  background  that  occur  between  two 
consecutive  footfalls.  When  the  footfall  periods  are  isolated,  r  estimates  for  these  durations 
vary  between  0.25  to  0.3. 

Fig.  4.5  shows  the  scatter  plot  of  5000  randomly  chosen  observation  pairs  from  the  pilot 
background  data  along  with  a  99  %  confidence  ellipse  for  a  bivariate  A/"([0,  0]T,  E)  distribution 
with  covariance  matrix 


E 


0.58 

0.009 


0.009 

0.05 


We  notice  that  almost  all  points  are  encompassed  within  the  confidence  ellipse.  The  diagonal 
entries  for  E  are  obtained  by  using  the  relation  a2  =  2y2  for  a  normal  distribution,  and  applying 
it  to  the  7  values  in  Table  4.3  corresponding  to  background  data.  The  off-diagonal  covariance 


82 


Fig.  4.5:  Scatter  plot  of  pilot  background  data  (5000  observation  pairs).  The  99%  confidence 
ellipse  is  shown  for  a  Af([0,  0]T,  E)  distribution,  where  =  0.58,  Ei)2  =  E2,i  =  0.009  and 
E2i2  =  0.05  are  the  elements  of  the  covariance  matrix  E. 
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Table  4.3:  Summary  statistics  for  parameter  estimates  of  seismic  data:  Mean,  Median  and 
[Standard  Error,  Median  Absolute  Deviation] 


Hyp.  & 

a 

0 

7 

5 

r 

Sensor  i 

1 

2,2 

- 

0.54,  0.54 

Or 

O 

1 

4^ 

O 

BG 

1 

[0,  0] 

[-] 

[0.001,0.03] 

[4.8  •  10"4,  0.02] 

0.02,  0.03 

2,2 

- 

0.17,0.17 

6  •  10“4,  0 

[0.002,  0.06] 

z 

[0.001,  0] 

[-] 

[0.0005,  0.01] 

[3  •  10”4,  0.01] 

1 

1.28,  1.23 

0.05,  0.01 

0.49,  0.47 

0.02,  0.01 

FS 

1 

[0.01,0.32] 

[0.007,  0.039] 

[0.004,  0.1] 

[0.004,  0.03] 

0.1,  0.1 

1.19,  1.14 

-0.004,  0 

0.51,0.48 

0.002,  0 

[0.003,  0.06] 

z 

[0.01,0.28] 

[0.005,  0.04] 

[0.005,0.11] 

[0.0003,  0.03] 

BG:  Background;  FS:  Footstep. 

Median  Absolute  Deviation,  MAD(Z)  =  median (|2j  —  median(Z)|) 


elements  are  obtained  by  noting  that  the  correlation  coefficient  p  =  sin(r7r/2)  for  a  Gaussian 
copula.  A  tail-index  value  of  a  =  2  and  the  scatter  plot  in  Fig  4.5  suggest  that  a  bivariate 
normal  is  a  satisfactory  model  for  the  background  data.  Note  that  a  bivariate  Gaussian  copula 
with  Gaussian  marginals  is,  in  effect,  a  bivariate  Gaussian. 

For  the  footstep  data,  the  MAD  for  (3  and  5  values  is  significantly  smaller  than  the  MAD  for 
a  and  7  values.  The  footstep  data  are,  therefore,  modeled  as  SaS  distributions,  with  unknown 
a  and  7.  Histogram  plots  for  a  and  7  corresponding  to  footstep  signals  from  each  sensor 
are  shown  in  Fig.  4.6.  We  observe  that  the  values  of  a  and  7  do  not  cluster  around  the  median 
values  reported  in  Table  4.3.  Instead,  they  show  approximately  bimodal  behavior.  We  infer  that 
this  behavior  is  observed  because  footfalls  are  separated  by  periods  of  interstitial  background. 
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SENSOR  1 


a 


SENSOR  2 


.5  1  1.5  2 

a 


0.5  1  1.5 

7 


Fig.  4.6:  Histogram  of  a  and  7  values  for  the  footsteps  data. 
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Results 

The  detection  problem  for  the  background  hypothesis,  H0,  vs.  the  footsteps  hypothesis,  H\ ,  is 
formulated  as, 


H0:  [X1,X2]t~A/'([0,0]t,£) 

Hi  :  X\  ~  5(1  «i  I,  0, 1 71 1,  0 )  X2  ~  <S([  a2 1  0,  72  1, 0  ) 


(4.46) 


Cl 


H0,  as  written  in  (4.46),  is  the  normal  representation  of  the  stable  distribution  as  parametrized 
for  background  data  in  Table  4.3,  with  c0  being  the  Gaussian  copula.  The  ROC  for  the  copula- 
based  detector  using  KRE  for  marginal  parameter  estimation  is  shown  in  Fig.  4.7.  Since,  the 
sensors  are  of  the  same  modality  (geophones),  the  ROC  for  individual  sensors  is  also  shown  for 
comparison.  The  copula  based  fusion,  along  with  the  o-stablc  modeling,  yields  superior  PD, 
especially  at  lower  Pp  values.  Table  4.4  shows  the  percentage  of  decision  windows  selected  for 
each  copula  family.  Recall  that  the  true  copula  is  not  known  under  H\ .  The  copula  selection 
process  mostly  selects  the  t  copula.  This  is  consistent  with  the  tail  dependence  properties  of  the 
t  copula,  i.e.,  for  the  footfall  periods  a  copula  that  can  adequately  model  co-occurring  footstep 
spikes  is  chosen.  The  Frank  copula,  on  the  other  hand,  is  an  Archemedean  copula  which  has 
0  tail  dependence  [65].  The  40  of  1500  windows  (^  2.66%  ),  modeled  as  a  Frank  copula,  is 
indicative  of  the  lack  of  tail  dependence  during  the  background  periods  which  are  interspersed 
between  consecutive  footfalls.  The  copula  selection  under  H0  is  more  evenly  spread  out  over 
the  copula  library.  This  is  largely  due  to  low  r  values,  which  means  that  cm  «  1  for  all  crn  e  C. 
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Table  4.4:  Percentage  of  copulas  selected  under  each  hypothesis 


Hypothesis 

Gaussian 

Student-t 

Clayton 

Frank 

Gumbel 

H0 

27.96 

23.04 

11.72 

23.97 

13.31 

H, 

0 

96.47 

0 

2.66 

0.87 

4.5  Summary 

In  this  chapter,  we  have  developed  the  asymptotic  theory  for  detection  performance  when  sen¬ 
sor  data  are  heavy-tailed  and  spatially  dependent.  The  spatial  dependence  was  modeled  using 
copula  theory,  and  the  heavy-tailed  nature  of  the  data  were  modeled  using  cc- stable  data.  A  de¬ 
tection  problem  was  formulated,  in  the  Neyman-Pearson  framework,  to  discriminate  between  a 
known  null  process  and  an  unknown  composite  hypothesis.  When  applied  to  footstep  data,  we 
observed  that  tail  dependence  plays  an  important  role  in  the  copulas  that  are  selected  as  a  part 
of  the  detection  problem.  We  also  observed  that,  while  the  ce- stable  models  are  able  to  explain 
the  behavior  of  the  observed  data,  parameter  estimation  -  necessary  for  the  construction  of  the 
test  statistic  -  is  a  computationally  challenging  task.  The  computational  load  incurred  is  rather 
severe,  in  spite  of  limiting  our  simulations  and  tests  on  real  data  to  two  sensors.  In  order  to 
scale  a  copula-based  detection  scheme  for  dependent  heavy-tailed  data,  we  need  to  use  more 
tractable  and  flexible  models  for  both,  marginal  and  copula  modeling.  One  of  the  alternative 
approaches  to  marginal  modeling  is  to  use  a  nonparametric  kernel  based  approach.  This  was 
used  in  the  bivariate  context  on  outdoor  seisimic-acoustic  data,  collected  by  ARL,  with  similar 
results  [34].  This,  and  other  approaches,  are  discussed  in  the  context  of  multivariate  modeling 
in  Chapter  5. 


Fig.  4.7:  ROC  for  Background  vs.  Footstep  detection. 
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Chapter  5 

Dependence  Modeling  for 
Detection  Using  Multiple  Sensors 


In  Chapter  4,  we  discussed  copula-based  detection  for  heavy-tailed  signals  and  applied  the 
GLR  test-statistic  to  a  bivariate  or  two  sensor  formulation  for  copula-based  detection.  In  this 
chapter,  we  address  copula  construction  and  model  selection  issues  for  the  multivariate  (mul¬ 
tisensor)  case.  These  considerations  influence  the  detector  design,  which  will  be  discussed  in 
detail  along  with  alternative  approaches  to  model  the  distribution  of  sensor  observations. 

The  rest  of  the  chapter  is  organized  as  follows.  The  mathematical  formulation  of  the  de¬ 
tection  problem  is  presented  in  Section  5.1.  Since  we  consider  the  dependence  between  data 
acquired  by  multiple  sensors,  we  need  to  consider  the  practical  implications  of  building  a  multi¬ 
variate  distribution.  We  elaborate  upon  the  issues  pertinent  to  the  construction  of  a  multivariate 
copula,  and  thus  a  multivariate  distribution,  in  Section  5.2.  We  present  our  results  in  Section  5.4 
and  provide  concluding  remarks  in  Section  5.5. 
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5.1  Problem  Formulation 

As  in  Chapter  4,  we  formulate  the  detection  problem  under  the  Neyman-Pearson  framework. 
We  denote  the  sensor  observations  by  x%3 ,  where  i  —  1,  2, . ... ,  L  denotes  the  sensor  index  and 
j  =  1,  2, . . . ,  N  denotes  the  time  index.  That  is,  a  decision  window  of  N  samples  per  sensor  is 
used.  Similar  to  Chapter  4,  we  make  a  simplifying  assumption  that  the  signals  are  i.i.d.  over 
time.  Also  recall  from  the  background  data  analysis,  in  Section  4.4.3,  that  we  observed  a  very 
low  correlation  under  H0.  While  we  explored  the  detection  problem  in  its  full  generality  in 
Chapter  4,  in  this  chapter  we  make  a  simplifying  assumption  that  the  sensor  observations  are 
normally  distributed  and  are  independent  under  H0.  This  will  facilitate  a  clearer  exposition  of 
the  multi-sensor  aspects,  i.e.,  multivariate  copula,  of  the  detection  problem. 

We,  therefore,  have  the  following  binary  hypothesis  testing  problem, 

L 

H0  :  fx(Xj)  =  exp  [-a;?./(2<7?)] 

i=  1 

(5.1) 

L 

Hi  :  fx(xj)  =  f(xij)  ■  c{Fi{xij), . . . ,  FL{xLj) \<f>) 

i=  1 

where  x3  =  \xij,x2j, . . . ,  xL]]  and  rr,  is  the  standard  deviation  of  the  background  process 
observed  by  the  /th  sensor.  Note  that  the  copula  density  function  c(-|0)  is  a  multivariate  copula 
density  function  parametrized  by  the  dependence  parameter  vector  </>. 

In  this  chapter,  we  assess  the  performance  of  various  Minimum  Description  Length  (MDL) 
based  copula  selection  schemes.  MDL-based  model  selection  schemes  are  heuristics  developed 
over  likelihood-based  model  selection.  They  include  penalty  terms,  which  are  functions  of 
parameter  dimensionality  and  sample  size,  and  emphasize  model  parsimony.  Details  on  how  to 
obtain  the  appropriate  c(-\<j>)  are  deferred  to  the  next  section.  Also,  in  practice,  the  parameters 
a i  and  <fi  are  generally  not  known  and  are  estimated  using  maximum  likelihood  estimation 
(MLE).  This  is  different  from  Chapter  4,  where,  in  order  to  develop  the  theory,  we  assumed  a 


simple  null  hypothesis.  Here,  both  H0  and  II  \  are  composite. 
The  test  statistic  employed  is  the  likelihood  ratio  given  as, 
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TM 


/x (xj  \H\) 
/x(xj  \H0) 


(5.2) 


and  is  evaluated  at  the  fusion  center  from  the  received  data.  For  a  given  copula  density,  the 
marginal  distribution  under  H\,  f  ( xt3 ) ,  will  determine  the  performance  of  the  detector.  For 
nonstationary  environments,  determining  the  marginal  distribution  of  sensor  observations  is 
often  a  challenging  task.  We  now  discuss  three  possible  detection  schemes,  that  essentially 
model  the  marginals  in  different  ways.  Each  scheme  gives  us  a  test  statistic,  Tk(xj),  k  =  1,2,  3. 
The  test  compares  log  T/,(xy)  against  a  threshold  77, 

N  Hi 

y'logTjfc(xj)  ^  77,  k  =  1,2,3  (5.3) 

'  *  TT 


5.1.1  Complete  ignorance 

In  this  case,  we  assume  the  worst  case  in  that  modeling  the  marginals  is  not  feasible.  The 
detector  ignores  the  marginal  information  and  is  completely  based  on  the  copula  density.  The 
test  statistic  in  this  case  is, 


Ti(xj)  =  c(Fi(xij), . .  .,FL(xLj)\<f>)  (5.4) 

where  0  is  the  ML  estimate  of  the  copula  parameters  and  F,(xtj)  denotes  the  empirical  prob¬ 
ability  integral  transform  for  the  i-th  sensor,  which  is  calculated  as  in  Eq.  (4.32).  Note  that 
Eq.  (5.4)  is  a  likelihood  ratio  with  /X(xj  |  H0)  =  1  since  it  is  the  L-fold  product  of  the  uniform 
probability  density  U(0, 1).  This  approach  is  similar  to  the  detector  used  in  [89];  the  difference 
here  is  in  the  construction  and  use  of  multivariate  copulas.  The  empirical  probability  integral 
transform  (EPIT)  provides  the  uniformly  distributed  arguments  for  the  copula  density.  For 
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notational  simplicity  we  use  Uij,  i.e., 


Uij  =  Fi{xij )  (5.5) 

In  other  words,  the  likelihood  ratio  uses  only  the  information  available  after  variable  transfor¬ 
mation,  i.e.,  EPIT. 


5.1.2  Approximate  modeling 

Under  this  scheme,  the  marginal  density  is  parametrized  by  a  known  p.d.f.  that  can  model  some 
critical  properties  of  the  signal  under  consideration.  We  emphasize  that  this  is  approximate 
modeling  since,  unlike  speech  signals  which  have  well-established  p.d.f.  models,  it  may  not  be 
feasible  to  accurately  model  other  types  of  signals  with  sufficient  generality.  For  example,  with 
the  footstep  data,  the  signal  characteristics  are  dependent  on  the  type  of  floor,  the  background 
environment  and  the  nature  of  the  mixing  of  footsteps  and  background.  For  our  dataset,  we 
have  observed  that  the  logistic  distribution  provides  a  workable  approximation  for  the  heavier 
tails  due  to  footfalls  and  is  also  able  to  capture  the  symmetric  nature  of  the  geophone  signal. 

We  denote  the  approximate  marginal  by  f(xij).  The  test  statistic  under  this  scheme  is, 


r2(x,-) 


nf=i  f(xij)  ■  c(uij, 

nf=i(\/2™?)-ie*p 


10) 

-4/(24) 


(5.6) 


We  assume  the  approximate  marginal  density  to  be  the  logistic  p.d.f.  as  it  has  heavier  tails 
than  the  Gaussian,  but  at  the  same  time  is  more  tractable  than  the  a- stable  density,  especially  for 
parameter  estimation.  For  the  case  of  the  footstep  data  mentioned  above,  recall  from  Fig.  4.6, 
that  the  a  values  were  rather  spread  out  with  one  of  the  significant  modes  occurring  at  a  ~  1.3. 
Hence,  the  Cauchy  model  would  try  to  fit  the  footstep  data  with  a  heavier  tail  than  it  actually 
possesses,  and  thus  the  logistic  density  is  a  more  appropriate  choice.  The  expression  for  the 
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logistic  density  is  given  by 


f(Xij)  = 


(3  ^ ij  /  ^ 


.  2  5 


(5.7) 


s  (l  +  e~Xii!sy 

where  s  is  the  scale  parameter  and  the  location  parameter  for  the  logistic  density  function  is 
standardized  to  0.  The  scale  parameter  is  unknown  and  in  order  to  evaluate  the  likelihood 
function  corresponding  to  the  marginal  density,  we  have  to  estimate  s. 


5.1.3  Nonparametric  marginal  estimation 

Kernel  density  estimators  [102]  provide  a  smoothed  estimate,  f(xij),  of  the  true  density.  The 
test  statistic  for  this  scheme  is,  therefore, 


2Vx,)  = 


nil  /o% )  •  c(n  j,  ■■■,  uLj\  <j>) 


n 


i=l ' 


271(7, 


21-1 


exp 


-4/(2^) 


(5.8) 


The  choice  of  bandwidth  of  a  kernel  based  estimator  largely  determines  the  accuracy  of 
the  density  estimate.  The  kernel  bandwidth  is  chosen  using  leave-one-out  cross-validation. 
The  selected  bandwidth,  h* ,  is  the  minimizer  of  the  cross-validation  estimator  of  risk,  J,  for  a 
kernel,  K.  The  risk  estimator  may  be  easily  computed  using  the  approximation, 


m 


hN 2 


+ 


2 

Nh 


K{  0)  +  (9 


(5.9) 


where  K*{x)  =  K^2\x)  —  2 K(x)  and  K^\z)  =  f  K(z  —  y)K{y)dy  (see  [102],  p.  136).  The 
Gaussian  kernel  was  selected,  so  that  K(x )  =  A f(x;  0, 1)  and  =  A f(z\  0,  2).  Therefore, 


h*  =  argmin  J(h) 
h 

The  complete  ignorance,  logistic,  and  nonparametric  models  provide  three  possible  alterna¬ 
tives  for  modeling  sensor  data.  These  marginal  models  are  combined  with  the  copula  models, 
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discussed  next,  to  form  the  joint  p.d.f. 

5.2  Construction  of  Multivariate  Copulas 

In  this  section,  we  address  the  issue  of  constructing  the  multivariate  copula  density  function. 
This  is  an  important  issue,  since  although  there  are  a  large  number  of  copula  functions  defined 
in  the  literature,  the  majority  of  them  are  defined  only  as  bivariate  distributions. 

An  exception  to  this  is  the  family  of  elliptical  and  Archimedean  copulas.  For  example, 
Jouini  and  Clemen  [42]  discuss  the  use  of  Archimedean  copulas  for  aggregating  expert  opin¬ 
ions  from  a  team  of  decision  makers.  A  rather  severe  limitation  of  using  Archimedean  copulas, 
however,  is  that  the  experts  in  the  decision  making  problem  are  necessarily  exchangeable.  That 
is,  the  experts  (sensors)  are  identical  for  the  decision  making  task.  This  is  not  reasonable  when 
dealing  with  heterogeneous  data.  Elliptical  copulas  place  restrictions  of  symmetry  on  the  na¬ 
ture  of  dependence,  which  need  not  hold  true  in  general.  The  next  subsection  discusses  a  tree- 
based  approach  to  model  multivariate  dependence.  The  method  discussed  uses  a  hierarchical, 
pairwise  scheme  and  is  free  of  symmetry  and  exchangability  restrictions. 

In  the  discussion  that  follows  (Section  5.2.1),  we  assume  that  the  copula  parameter  is 
known,  and  do  not  include  it  explicitly  in  the  copula  function  for  brevity  of  notation.  We 
note  that  the  copula  parameter  is  typically  estimated  as  a  part  of  the  copula  selection  process 
(Section  5.2.2). 

5.2.1  Vines 

Kurowicka  and  Cooke  [53]  discuss  a  graphical  method  of  constructing  copulas  using  vines.  A 
vine  is  a  nested  set  of  trees,  where  the  edges  of  the  A:th  tree  are  the  nodes  of  the  (k  +  l)th  tree, 
and  each  tree  has  a  maximum  number  of  edges.  The  trees  are  called  dependence  vines  when 
they  are  used  to  encode  dependence  structures  in  multivariate  distributions.  There  are  several 
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vine  architectures  possible;  Bedford  and  Cooke  [10]  present  a  graphical  model  that  focuses  on 
pairwise  interactions  of  dependent  variables  using  regular  vines. 

Two  types  of  regular  vines  have  been  analyzed  in  the  literature  [2]  in  the  context  of  express¬ 
ing  multivariate  copulas:  the  canonical  vines  or  C-vines,  and  the  drawable-vines  or  D-vines. 
For  our  work,  we  use  the  D-vine  architecture  since  they  are  better  suited  to  our  application; 
the  C-vines  are  useful  when  it  is  known  that  a  particular  sensor  plays  a  key  role  in  governing 
inter-sensor  dependencies.  A  vine,  regular- vine  and  D-vine  are  formally  defined  below. 

Definition  5.1.  V  is  a  vine  on  K  elements  if, 

1.  V=(T1,...,TX_1) 

2.  Ti  is  a  connected  tree  with  nodes  Ni  =  {1 , ...  ,K}  and  edges  Ex;  Tfc  is  a  connected  tree 
with  nodes  =  Efc_i  for  k  —  2,  3, . . . ,  K  —  1 

V  is  a  regular  vine  on  K  elements  if  it  satisfies  the  additional  proximity  condition, 

3.  For  k  =  2, . . . ,  K  —  1,  if  a  and  b  are  nodes  of  Tk  connected  by  an  edge  in  Tfc,  where 
a  =  { a i ,  a2}  and  b  =  { bi ,  b2}  are  edges  in  T fc_i,  then  exactly  one  of  ai,  a2  equals  one  of 
bi,b2. 


A  regular  vine  is  called  a  D-vine  if  each  node  in  T  |  has  a  degree  of  at  most  2.  A  D-vine 
over  4  elements  is  shown  in  Fig.  5.1.  When  a  four-variate  joint  distribution  is  defined  over 
this  vine,  we  are  essentially  establishing  a  hierarchical,  pairwise  dependency  relation,  which 
can  be  expressed  through  copulas.  Each  tree  in  the  vine  represents  a  decomposition  obtained 
by  successively  conditioning  the  variables.  We  elaborate  on  this  procedure  below  using  the 
example  of  a  vine  over  three  nodes. 

Consider  a  vine  over  nodes  {ni,  n2,  ri:> }  =  Ni.  For  notational  convenience,  in  this  section, 
we  drop  the  time  index  j.  Each  node  n%  e  Ni  observes  data  xt.  We  can  write  the  following 
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Fig.  5.1:  D-vine  over  4  elements.  Labels  indicate  the  copula  density  evaluated  at  each  tree  in 
the  vine. 


pair  densities, 


f(xi,x 2)  =  c12(F1(xi),F2(x2))  ■  f(xl)f(x2)  (5.10) 

f{x 2,x3)  =  c23(F2(x2),F3(x3))  ■  f(x2)f(x3)  (5.11) 

where  we  use  subscripts  for  the  copula  density  function  to  clarify  the  node  pairs  under  con¬ 
sideration.  In  the  context  of  vines,  the  copula  between  pairs  of  nodes  is  also  referred  to  as  the 
pair-copula  density.  The  conditional  density  for  the  pair-copulas  above  is, 

f{x i\x2)  =  c12(F1(x1),  F2(x2))  ■  f(x  1)  (5.12) 

f(x3\x2)  =  c23(F2(x2),  F3(x3))  •  f(x3)  (5.13) 

From  Equations  (5.12)  and  (5.13)  we  can  derive  the  conditional  CDFs  that  can  be  used  as 
arguments  for  the  copula  defined  for  tree  T2  in  the  vine.  It  is  easily  seen  that, 


f(x1,x3\x2)  =  f  (xi\x2)f  (x3\x2)  X  Ci3|2(Fi|2(xi|x2),  ^312(^31^2)) 


(5.14) 
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The  joint  density  of  Xi,  x2  and  x3  is, 


f(x i,x2,x3)  =  f  {x2) f  (xi\x2) f  (x3\x2)c13\2(Fi\2(xi\x2) ,  F3\2(x3\x2)) 
=  f^x^f^x^x^c^F^x^x^.F^x^x^) 

=  f(x1)f(x2)c12(Fl(x1 ),  F2(x2))f(x3\x2) 
x  Ci3|2 (^112 (a^l  |^2),  F3\2(x3\x2)) 

=  f(xi)f(x2)f(x3)c23(F2(x2),  F3(x3)) 
x  Ci3|2(iil|2(xi|S2),  F3\2(x3\x2)), 


(5.15) 


where  (a)  follows  from  Eq.  (5.10)  and  ( b )  follows  from  Eq.  (5.11).  In  a  similar  manner,  it  can 
be  shown  that  the  joint  p.d.f.  for  a  4  variable  D-vine  is  [2], 


f(x  !,x2,x3,x4)  =  f(xi)f(x2)f(x3)f(x4) 

•  Ci2(Fi(xi ),  F2(x2))  ■  c23(F2(x2),  F3(x3))  ■  c34(F3(x3),  F^x^) 

(5.16) 

•  Ci3|2(Ei|2(o;i|£2),  -^312(^31^2))  •  c24|3(F2|3(a;2|a:3),  i^O^l^s)) 

•  Ci4|23(-Pl|23(®lk2,  ^3),  F4\23(x4\x2,  X3)) 

The  labels  in  Fig.  5.1  indicate  the  copula  density  evaluated  at  each  tree  in  the  vine.  The 
density  of  an  L— dimensional  distribution  expressed  in  terms  of  a  D-vine  decomposition  is 
given  by  Bedford  and  Cooke  [9], 

L  L-lL-j 

n  Wcij+m+l'.-.j+k-l  {F(xj\Xj+ 1,  .  .  .  ,Xj+k_i),  F(x  i+j\xi+±i  ■  ■  ■  1  Xj+k-l)) 
i=  1  j= 1  k= 1 

(5.17) 

5.2.2  Copula  selection 

The  importance  of  copula  selection  has  been  noted  at  various  points  as  a  vital  component 
of  copula-based  designs  [36,  90].  The  dependence  between  the  sensor  observations  may  get 
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manifested  in  different  ways  and  the  copula  function  that  best  models  it  should  be  selected. 
Selecting  a  copula  function  that  does  not  adequately  model  the  statistical  dependence  between 
the  sensor  observations  may  result  in  model  mismatch  subsequently  deteriorating  the  detection 
performance. 

When  constructing  multivariate  copulas  using  vines,  the  copula  selection  process  has  to  be 
repeated  for  each  pair-copula  in  every  tree  in  the  vine.  We  use  a  minimum  description  length 
(MDL)  [33]  based  approach  for  model  selection.  MDL  techniques  of  model  selection  are  based 
on  the  principle  that  the  model  that  achieves  the  best  compression  is  the  model  best  suited, 
from  the  available  alternatives,  to  describe  the  data.  In  our  case,  we  do  not  know  the  “true” 
copula  density,  c(-),  and  the  best  possible  copula  is  selected  from  library  of  copula  functions 
introduced  in  Chapter  3. 

In  this  chapter,  we  compare  four  criteria  available  under  the  MDL  framework;  the  criteria 
considered  are:  (1)  Akaike  Information  Criterion  (AIC),  (2)  Bayesian  Information  Criterion 
(BIC),  (3)  Stochastic  Information  Complexity  (SIC)  and  (4)  Normalized  Maximum  Likelihood 
(NML).  Suppose  the  copula  functions  are  parametrized  by  S  of  dimensionality  d.  Then,  these 
MDL  criteria  are  defined, for  the  connected  node  pairs  {ni,n2}  G  T ,  as: 


N 


AIC 

=  -^2logc(F(xnij),  F(xn2j))  +  d/2 

3= 1 

(5.18) 

BIC 

N  ^ 

=  -  J^log c{F(xnij),F(xn2j))  +  -logN 

3= 1 

(5.19) 

SIC 

N  l 

=  -  J^log c{F(xnij),F(xn2j))  +  -log|S| 

7  —  1 

(5.20) 

NML 

=  -  J^log c(F(xnij),F(Xn2j))  +  ^log  f^) 

3=1  ' 

+iog  J  Vi /(0)i# 

(5.21) 
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where  E  in  Eq.  (5.20)  denotes  the  determinant  of  the  Hessian  of 

N 

~^2^ogc(F(xnij),  F(xnaj)\$), 

3=1 

and  \I((f>)\  in  Eq.  (5.21)  is  the  determinant  of  the  Fisher  information  matrix  evaluated  over 

c(F(xnij),  F(xn2j) \(pi). 

Copula  selection  is  performed  for  each  bivariate  copula  term  in  (5.16)  .  In  order  to  do 
this,  we  evaluate  AIC,  BIC,  SIC,  and  NML  for  each  of  the  copulas  in  the  copula  library 
(Table  4.2).  The  copula  corresponding  to  the  minimum  value  of  each  of  the  MDL  criteria  is 
selected.  Note  that  this  is  similar  to  the  copula  selection  procedure  discussed  in  Chapter  3: 
the  minimum  value  is  chosen  in  this  case  because  MDL  criteria  are  defined  as  functions  of  the 
negative  log-likelihood. 

5.2.3  Node  ordering 

The  D-vine  characterization  of  multivariate  dependence  constrains  the  tree  T 1  to  have  a  degree 
of  at  most  2.  This  implies  that  node  ordering  is  important,  since  different  orderings  may  give 
rise  to  different  joint-distributions,  especially  since  our  copula  selection  is  done  through  a 
library  of  copulae.  Therefore,  before  constructing  the  D-vine,  we  must  select  an  appropriate 
node  ordering  scheme. 

Since  our  detector  capitalizes  on  the  dependency  information,  we  use  a  dependency  crite¬ 
rion  to  order  the  nodes.  Specifically  for  the  seismic  footstep  signal  detection,  we  would  like 
to  pair  sensors  that  exhibit  greater  co-movement  in  their  signal  amplitudes.  In  other  words,  we 
wish  to  measure  the  dependence  behavior  of  the  sensors  at  the  tails.  Chapter  3  discussed  the 
two  tail  dependency  measures  called  the  upper  and  lower  tail  dependence.  It  may  be  recalled 
that  for  a  continuous  random  vector  [X,  Y],  with  marginal  CDFs  F  and  G,  and  copula  CDF 
C(F(X),G(Y)), 
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A u  =  lim  P(y  >  G~\u)\X  >  F~\u)) 

u/'l 

,  1-2  u  +  C(u,u) 

=  lim - 

u/U  1  —U 


(5.22) 

(5.23) 


«\o 


C(u ,  u) 


=  lim 

u\o  u 


(5.24) 

(5.25) 


Since  a  seismic  signal  oscillates  about  its  mean,  A u  and  A l  provide  similar  information 
for  characterizing  the  co-movement  of  the  signal  under  consideration.  In  the  node  ordering 
algorithm  described  below  we  use  Eq.  (5.23)  as  the  measure  by  which  suitable  nodes  are  paired. 

Recall  that  L  denotes  the  number  of  sensors  and  that  Ti  is  the  tree  connecting  the  sensor 
nodes.  Prior  to  the  D-vine  construction,  we  order  the  nodes  in  Tx,  Nx  =  {nx,  n2, . . . ,  n^},  for 
each  frame.  For  this  purpose,  we  implement  the  following  algorithm, 

1 .  Choose  a  random  ordering  of  sensor  nodes.  This  is  only  to  initialize  the  ordering  process; 
it  is  done  just  once  and  not  repeated  for  each  frame.  Label  this  initial  ordering  of  L  nodes 
as  n\,n\, . . .  ,n\.  The  superscript  indexes  the  iterations  through  which  the  algorithm 
runs. 

2.  For  /  =  1,2,...,  L-  2: 

(a)  Pick  n\  and  compute  \v  (Eq.  (5.23))  for  each  of  the  L  —  l  pairs, 


using  the  copula  selected  (Section  5.2.2)  for  that  node  pair. 
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(b)  Choose  the  pairing  that  has  the  maximum  value  of  Xu-  Suppose  this  maximum 
occurs  for  some  node  k,  l  <  k  <  L,k  G  Z+,  that  is,  {n\,  nlk}  has  the  greatest  value 
of  A u  among  all  node  pairs.  We  swap  and  relabel  the  nodes  as  follows: 

4  ->  4S 
n\+i  -»•  4+1 

4  ->•  nl+\  p  —  l  +  2,  l  +  3, . . . ,  L 
p  ^  k 


3.  The  set  of  nodes  K-2,...,n£-2}  =  Nx  is  the  set  of  nodes  in  Tx  ordered  by  decreasing 
tail  dependence. 


5.3  Detection  algorithm 

In  the  earlier  discussion,  we  discussed  individual  components  of  the  detection  process.  In  this 
section,  we  list  the  steps  required  for  the  detection  detecting  an  incoming  data  sequence  of 
length  N  data  from  L  sensors.  We  assume  that  a  desired  MDL  criterion  is  selected  and  is  used 
for  copula  selection.  During  copula  selection,  we  also  assume  that  ML  estimates  are  used  to 
determine  the  unknown  copula  parameters. 

1.  For  each  i-th  sequence  of  length  N,  corresponding  to  the  sensor  i,  evaluate  the  marginal 
likelihood  using  either  logistic  or  nonparametric  methods.  For  the  logistic  model,  the 
parameter  s  in  (5.7)  is  estimated  using  MLE. 

2.  Obtain  a  node  order  for  the  base  tree  T x  (Section  5.2.3). 

3.  Evaluate  the  conditional  CDF  arguments  for  the  copulas  of  T 2,  as  specified  in  (5.16). 

4.  Use  the  copula  selection  procedure  (using  the  desired  MDL  criterion)  to  identify  the 


functions  for  ci3|2  and  c24|3. 
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5.  Using  the  copula  functions  for  ci3|2  and  c24|3,  obtain  the  conditional  CDFs  for  the  copula 
arguments  of  tree  T3. 

6.  Find  the  copula  function,  from  the  copula  library,  which  minimizes  the  MDL  criterion 

for  Ci4|23- 

Steps  1-6  give  us  the  log-likelihood  under  Hi 

7.  Estimate  the  variance  of  the  signal,  assuming  normality  (likelihood  under  H0 ) 

8.  Calculate  the  test  statistic  Tk  using  either  of  the  definitions  (5.4),  (5.6)  or  (5.8). 

9.  Compute  the  test  statistic  with  a  threshold  obtained  based  on  the  Neyman-Pearson  crite¬ 
rion  to  arrive  at  the  H0  vs.  II \  decision. 

5.4  Results 

In  this  section,  we  describe  the  results  obtained  from  various  cases  considered.  The  multivariate 
copula-based  detectors,  described  in  the  previous  section,  were  applied  to  the  footstep  data 
(Section  2.2).  The  footstep  and  background  signals  are  split  into  non-overlapping  frames  of 
50  samples  per  sensor.  Although  the  setup  consisted  of  a  linear  array  of  6  sensors,  the  results 
presented  here  use  L  —  4  sensors;  the  “center”  sensors,  i.e.,  2nd  to  5th  sensors  are  used.  In 
the  discussion  that  follows,  we  use  receiver  operating  characteristics  (ROCs)  to  characterize 
the  detection  performance;  the  ROCs  shown  are  the  averages  over  randomly  chosen  ensembles 
of  10  trials.  Probability  of  false  alarm  and  probability  of  detection  are  denoted  by  PF  and  PD, 
respectively.  PF  and  PF  are  determined  empirically  by  varying  the  threshold  //. 

Fig.  5.2  shows  the  ROCs  comparing  the  different  selection  criteria  discussed  in  Section 
5.2.2.  The  different  selection  criteria  are  compared  using  T,  (xj)  (complete  ignorance  case) 
since  modeling  of  the  marginal  distributions  has  no  effect  on  the  copula  selection.  We  observe 
a  slight  improvement  in  overall  detection  performance  with  NML  as  the  selection  criterion. 
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Fig.  5.2:  ROC  comparing  the  different  MDL-based  selection  criteria. 

However,  at  lower  Pp  values  SIC  is  observed  to  perform  better.  Note  that  these  are  empirical 
observations  made  on  the  footstep  data.  As  such,  it  has  been  reported  that  SIC  and  NML 
are  superior  approaches,  especially  when  used  with  nested  models  [33].  We  expect  that  these 
methods  will  perform  better  than  AIC  and  BIC  when  we  consider,  e.g.,  a  mixture  of  copulae. 
This  is  a  topic  for  future  investigation.  NML  also  incurs  a  large  computational  cost  because 
f  |/(0)|,  in  Eq.  (5.21),  is  evaluated  over  all  possible  values  of  (f).  For  the  remainder  of  the 
section,  SIC  is  used  as  the  criterion  for  copula  selection. 

Using  SIC  for  model  selection  from  the  copula  library,  we  compare  the  performance  of  the 
detectors  based  on  Tk(x.j)  for  each  k  —  1,2,  3.  Here  we  observe,  in  Fig  5.3,  as  expected,  that 
the  non-parametric  modeling  of  the  marginal  distribution  gives  the  best  detection  performance. 
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Fig.  5.3:  ROC  comparing  the  different  detectors  obtained  from  different  marginal  models. 
SIC  is  used  for  copula  selection.  Cl:  Complete  Ignorance,  AM:  Approximate  Modeling,  NP: 
Nonparametric  modeling. 

The  approximate  model,  while  easier  to  compute  online,  has  a  lower  performance.  Detection 
by  ignoring  the  marginal  information  also,  expectedly,  performs  poorly  in  comparison. 

One  of  the  key  components  of  the  multivariate  copula  construction  was  node  ordering.  The 
nodes  in  T i  correspond  to  sensor  nodes  and  are  ordered  using  the  tail-dependence  criterion, 
Xu  (cf.  Eq  (5.23)).  Recall  from  Section  2.2  that,  in  our  test-bed  setup,  sensors  are  placed  as 
a  linear  array  along  a  hallway.  This  suggests  that  there  exists  a  natural  ordering;  the  sensor 
closest  to  one  end  of  the  hallway  can  be  sensor  1  and  neighboring  sensors  can  be  successively 
indexed  as  2,  3  and  4.  However,  we  observe  that  using  A u  to  order  the  sensors  results  in  a  dif¬ 
ferent  ordering.  This  is  most  likely  due  to  the  non-homogeneity  of  the  seismic  medium,  in  this 
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Fig.  5.4:  ROC  illustrating  the  benefit  of  using  A[/-based  node  ordering. 

case,  the  hallway  floor.  The  natural  ordering  is  used  to  initialize  the  node  ordering  algorithm. 
The  D-vine  built  using  this  natural  ordering  is  labeled  as  the  “Without  sensor  ordering”  case  in 
Fig.  5.4.  The  curve  labeled  as  “With  sensor  ordering”  is  the  detection  performance  correspond¬ 
ing  to  the  D-vine  built  using  the  end  result  of  the  node  ordering  algorithm.  We  observe  that 
tail-dependence  based  ordering  gives  superior  detection  performance.  While  the  node  order 
changes  for  each  data  frame,  i.e.,  each  block  of  N  samples,  we  observe  some  consistencies  in 
the  pairing  patterns.  We  observe  {2, 4}  and  {3,  5}  are  consistently  paired  together.  This  is  also 
consistent  with  the  nature  of  the  time-series  behavior  observed  in  Fig.  2.4,  where,  for  a  given 
time-interval,  footstep  spikes  occur  over  the  same  scale  for  these  sensor  pairs. 

Constructing  a  multivariate  copula  leads  to  increased  system  complexity  as  well  as  addi- 
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Fig.  5.5:  ROC  comparing  multivariate  copula  based  detector  to  various  bivariate  copula  based 
detectors.  {Sp,  Sq}  represent  sensor  pairs  for  p,q  —  1,  2,  3, 4  and  p  ^  q.  The  x-axis  is  on  a 
logarithmic  scale  to  emphasize  low  PF  values. 

tional  computational  effort.  Fig.  5.5  shows  that  the  additional  complexity  leads  to  substantial 
gains  in  detection  performance  when  compared  to  bivariate  copula  based  detectors.  In  Fig.  5.5, 
the  detection  results  are  presented  for  the  4- variate  case  versus  results  for  a  number  of  bivari¬ 
ate  cases,  considering  various  pairs  of  sensors.  Since  the  comparative  performance  depends 
on  the  copula  function  alone,  the  ROCs  are  obtained  from  detectors  that  ignore  the  marginal 


information. 
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5.5  Summary 

In  this  chapter,  we  have  discussed  detection  schemes  that  consider  multisensor  dependence 
using  a  copula-based  approach.  Our  detector,  designed  in  the  Ney man-Pear son  framework, 
demonstrates  that  accounting  for  multivariate  dependence  leads  to  significant  improvement 
over  a  bivariate  approach.  The  vine -based  approach  is  suitable  for  modeling  asymmetric  de¬ 
pendencies  in  a  tractable  manner.  However,  the  computation  of  conditional  CDFs  and  densi¬ 
ties,  when  repeated  for  many  nodes,  leads  to  some  computational  burden.  Stream  based  parallel 
computing  solutions  are  attractive  to  solve  such  inference  problems,  when  information  fusion 
is  required  over  large  scale  networks.  These  issues,  such  as  computational  problems  and  dis¬ 
tribution  estimation  in  a  decentralized  framework,  are  open  problems  ripe  for  future  research 
in  the  field  of  data  fusion.  The  next  chapter  concludes  this  dissertation  by  summarizing  the 
main  contributions  discussed  in  this  and  previous  chapters,  and  also  discusses  several  ideas  for 
future  research. 
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Chapter  6 

Detection  of  Footsteps  from 
Outdoor  Data 


In  this  chapter,  we  evaluate  the  performance  of  the  copula-based  detector  on  seismic  and  acous¬ 
tic  data  collected  by  the  U.S.  Army  Research  Laboratory  at  the  southwest  US  border.  While 
the  previous  chapters  considered  fusion  of  seismic  data  from  an  indoor  environment,  this  chap¬ 
ter  considers  the  fusion  of  seismic  and  acoustic  data  from  an  outdoor  environment.  The  next 
sections  describe  the  data-collection  and  the  detection  performance. 

6.1  Data  collection  and  preprocessing 

We  used  the  footstep  data,  made  available  by  the  US  Army  Research  Laboratory  (ARL),  col¬ 
lected  at  the  southwest  US  border.  The  dataset  consists  of  raw  observations  from  several  sen¬ 
sors  of  different  modalities  that  were  deployed  in  an  outdoor  space  to  record  human  and  animal 
activity  that  is  typical  in  perimeter  and  border  surveillance  scenarios.  The  participants  in  the 
data  collection  exercise  walked/ran  along  a  predetermined  path  with  sensors  laid  out  along 
either  side  of  the  path. 
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Seismic  and  acoustic  time  series  for  activities  representing  a  single  person  walking,  two 
persons  walking  and  human  leading  an  animal  (among  other  examples)  are  available  in  the 
ARL  dataset.  Each  seismic/acoustic  time  series  contains  a  leading  60s  of  background  data.  We 
use  this  as  our  H0  data.  The  data  are  sampled  at  10kHz,  and  are  mean  centered  and  oscillatory 
in  nature. 

Before  applying  the  copula-based  detector,  we  first  pre-process  the  data.  The  time  series 
is  split  into  non-overlapping  frames  of  length  T  =  512.  This  raw  time  series  data  is  called 
XTi(t)  where  i  =  1,  2  is  the  sensor  index  for  the  acoustic  and  seismic  modalities  respectively, 
and  t  is  the  time  index.  In  keeping  with  Houston’s  analysis  that  Fourier  spectra  for  seismic  and 
acoustic  footstep  data  are  more  informative  than  time-domain  measurements  [35],  we  set 

Xij  =  \J 7  {xTi{t)}2 , 


and 

1  \  ^  ~ 

/■y*  .  .  /y»  .  .  \  ry  .  . 

—  ^13  jy  /  J  ■>' ij , 

3 

where  T  is  the  DFT  and  j  =  1, N  =  256  is  the  frequency  index.  Our  sensor  measurements 
are,  therefore,  now  transformed  to  the  frequency  domain  and  the  statistics  of  x  =  [xtj\  are  used 
as  the  input  to  the  detector. 

Under  the  background  hypothesis  we  have  observed  that  x%3  are  normally  distributed  and 
are  spatially  independent.  For  the  footstep  hypotheisis,  we  use  a  copula  library  consisting  of 
Gaussian,  Gumbel  and  Frank  copulas.  We  have  observed  that  due  to  the  interstitial  nature  of 
footstep  data,  including  the  independence  copula  (Table  4.2)  in  the  library  improves  the  overall 
detection  performance. 
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6.2  Overview  of  the  detector 


In  this  chapter,  the  hypotheses  are  formulated  as  follows: 


N 

Ho  ■  /o(x)  =  n 

3= 1 
N 

Hi  ■  /i(x)  =  n 

3=1 


n  fo(xij \^0iy 

i— 1 

n  fi(xij\0ii, 

i= 1 


X  Cl  (Uij(0n),..  (0Ll)|0i) 


(6.1) 


where  L  =  2,  and  the  index  i  represents  a  seismic  sensor  and  an  acoustic  sensor.  Here  the 
distribution  under  the  null  hypothesis  of  background,  /0,  is  a  the  Gaussian  p.d.f.  For  this 
application,  establishing  a  stationary  model  under  II  \  is  not  feasible.  Therefore,  /i  is  deter¬ 
mined  non-parametrically  and  vn]  is  obtained  using  the  empirical  probability  integral  transform 
(EPIT).  The  test  statistic  is,  therefore,  expressed  as, 


N  L  i  /  \ 

rNPM(x)-^^iog 

j=\  i=l  Jo{xtj\Voi) 

N 

(6.2) 

+  Elos 

3=1 

where  c*  is  obtained  using  the  copula  selection  process  described  in 

Chapter  4.  The  uniform 

random  variables  in  the  copula  density  are  evaluated  using  EPIT, 

1  N 

^i(')  =  JY  E 

3=1 

(6.3) 

Uij  Fi{xij) 

(6.4) 

where  0  is  the  indicator  function.  The  test  is,  therefore, 


H  i 

7npm(x)  ^  rj. 

Ho 


(6.5) 
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The  marginal  model  under  Hi  is  determined  through  a  kernel  density  estimation  procedure, 
as  described  in  Chapter  5.  Recall  that  kernel  density  estimators  [102]  provide  a  smoothed 
estimate,  f\  (x^-),  of  the  true  density. 

6.3  Results 

For  the  ARL  dataset,  we  use  the  statistic  TNPM(x)  in  (6.2).  To  generate  ROCs,  we  compare 
the  test-statistic  to  a  vector  of  thresholds.  The  curve  thus  generated,  for  the  case  when  Hi 
corresponds  to  one  person  walking,  is  shown  in  Fig.  6.1.  This  curve  is  compared  to  the  ROC 
for  the  product  rule,  i.e.,  independence  assumption  for  H\ .  Similar  ROCs  are  obtained  for 
the  cases  of  two  persons  walking,  and  man  leading  an  animal  and  are  shown  in  Fig.  6.2  and 
Fig.  6.3. 

For  all  the  three  cases,  we  observe  that  our  proposed  method,  using  copula-based  depen¬ 
dence  characterization  along  with  copula  selection,  outperforms  the  ROC  corresponding  in¬ 
dependence.  We  further  observe  that,  the  two-persons  and  man-leading-animal  cases  have  a 
higher  probability  of  detection  (Pd)  for  a  given  probability  of  false  alarm  (  Ip),  when  compared 
to  the  one  person  case.  This  is  intuitive,  since  for  the  two-persons  case  and  man-leading-animal 
case,  we  have  a  higher  signal  to  noise  ratio. 

6.4  Conclusion 

The  detection  results  obtained  on  the  outdoor  dataset  are  similar  to  those  obtained  on  the  indoor 
dataset.  Although  the  outdoor  seismic-acoustic  environment  is  quite  different  from  a  typical 
indoor  environment,  the  copula  selection  process  ensures  that  the  dependence  is  adequately 
modeled.  Hence,  the  detection  performance  is  also  superior  when  compared  to  the  indepen¬ 
dence  assumption. 
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Chapter  7 

Summary  and  Future  Directions 


When  monitoring  various  phenomena,  measurements  are  often  nonlinear  non-Gaussian  and 
dependent.  In  this  dissertation,  we  have  investigated  the  design  and  analysis  of  detection 
schemes  for  one  specific  class  of  such  data,  namely  heavy-tailed  dependent  data.  Specifi¬ 
cally,  we  have  considered  indoor  footstep  data,  obtained  using  an  array  of  geophone  sensors, 
as  a  representative  example  for  heavy-tailed,  dependent  data  and  have  developed  appropriate 
detection  schemes. 

We  used  a- stable  models  as  the  characterization  for  heavy-tailed  signals.  Stable  distri¬ 
butions  are  an  important  class  of  models  in  the  study  of  phenomena  where  extreme-valued 
measurements  occur  with  polynomially  decaying  probability.  The  co-occurrence  of  such  (rare) 
extreme-valued  data  is  sometimes  symptomatic  of  a  catastrophic  event,  and  its  detection,  there¬ 
fore,  needs  appropriate  modeling  tools.  In  this  dissertation,  we  have  moved  in  that  direction  by 
addressing  the  problem  of  detection  for  statistically  dependent  stable  distributed  heavy-tailed 
data. 

We  proposed  a  copula-based  modeling  scheme,  which  allows  for  a  range  of  possibilities 
in  terms  of  dependence  modeling.  It  allows  for  the  joint  modeling  of  heavy-tailed  marginals 
with  disparate  support.  It  allows  us  to  capture  varying  degrees  of  dependence  in  the  tails.  The 
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proposed  approach  is  useful  for  formulating  various  detection  problems;  e.g.,  it  can  be  used  to 
discriminate  between  signals  embedded  in  dependent  a- stable  noise. 

We  formulated  a  general  two-sided  test  in  the  Neyman-Pearson  framework,  and  discussed 
the  need  for  copula  selection.  We  used  a  copula  selection  scheme,  which  along  with  likelihood 
ratio  test  statistics,  ensures  asymptotic  optimality  when  the  copula  corresponding  to  the  data 
generation  process  is  contained  in  a  copula  library.  The  asymptotic  performance  of  the  detec¬ 
tors  was  studied  in  detail.  We  considered  several  simulated  examples,  and  demonstrated  that 
appropriate  models  lead  to  significantly  superior  detection  probabilities,  Pd,  especially  in  the 
low  Pp  regime. 

In  order  to  test  our  methodology  on  actual  sensor  data,  we  applied  the  proposed  model  and 
detection  scheme  to  the  problem  of  indoor  footstep  detection.  Using  the  dataset  described  in 
Chapter  2,  we  showed  that  the  footstep  data  are  modeled  well  using  SaS  models.  Detector 
and  copula  selection  performance  was  evaluated.  Without  significant  preprocessing,  we  obtain 
PD  >  0.9  in  the  region  PF  >  0.01.  The  copula  functions  selected  are  also  consistent  with  the 
observed  nature  of  footstep  signals  in  terms  of  tail  dependence  behavior. 

Finally,  we  considered  the  issue  of  modeling  multivariate  (i.e.,  multisensor)  dependence. 
While  the  theory  of  copulas  can  characterize  multivariate  statistical  dependence  in  full  gener¬ 
ality,  most  copula  functions,  which  have  been  proposed  in  the  literature,  are  effective  only  for 
bivariate  dependence  characterization.  We  used  a  vine-based  approach  to  express  the  multi¬ 
variate  copula  density,  in  which  observation  pairs  from  different  sensors  are  combined  in  an 
hierarchical  manner.  Additionally,  we  also  considered  different  approaches  to  modeling  the 
marginal  p.d.f.s  so  that  the  computational  effort  of  fitting  an  a-s table  distribution  is  some¬ 
what  mitigated.  A  GLR  test  statistic  was  designed  for  the  footstep  detection  problem,  and  the 
performance  of  minimum  description  length  (MDL)  based  copula  selection  schemes  were  in¬ 
vestigated.  An  important  aspect  of  multivariate  modeling  is  to  ascertain  which  nodes  should 
be  paired  at  the  base  tree  of  the  vine -based  copula  model.  We  have  proposed  a  tail-dependence 
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based  algorithm  for  node  ordering.  Our  results  on  the  footstep  data  show  that  the  proposed 
scheme  yields  significant  performance  improvement  when  compared  to  the  case  where  only 
two  sensors  are  used. 


7.1  Future  directions 

Based  on  the  modeling  and  detection  problems  addressed  in  this  dissertation,  several  directions 
for  future  research  may  be  identified.  These  ideas  are  discussed  in  this  section. 

7.1.1  Distributed  detection  with  ct-stable  dependent  observations 

The  proposed  detection  schemes  were  implemented  as  centralized  schemes.  Further  research 
will  be  necessary  to  port  the  copula  based  approach  onto  a  fully  distributed  framework.  Iyen¬ 
gar  et  al.  [37]  were  successfully  able  to  address  the  issue  of  computational  efficiency  in  a 
copula-based  approach  to  fusing  dependent  sensor  decisions  in  a  distributed  detection  setup  by 
injecting  a  controlled  amount  of  noise  to  the  sensor  output.  The  approach,  in  [37],  is  espe¬ 
cially  significant  for  signals  with  stable  (marginal)  distributions,  since  their  analysis  is  largely 
in  the  characteristic  function  domain.  A  joint  characteristic  function  approach,  based  on  cop¬ 
ula  theory,  would  be  a  natural  extension  for  distributed  inference  using  dependent  heavy-tailed 
signals.  The  implication  of  using  vine-based  topologies,  in  the  context  of  joint  characteristic 
functions,  may  also  be  explored. 

7.1.2  Sequential  detection 

Sequential  detection  procedures  as  first  established  by  Wald,  are  essentially  a  generalization  of 
the  Neyman-Pearson  formalization.  For  a  desired  size  and  power,  a  sequential  test  procedure 
seeks  to  determine  the  smallest  number  of  observations,  N,  that  satisfy  the  false  alarm  and 
power  constraints.  This  number  N  is,  in  fact,  N( X):  it  is  a  function  of  the  random  variable 
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being  observed  and  is  called  the  Wald  stopping  variable.  As  with  the  Ney man-Pear son  formu¬ 
lation,  the  optimal  test  statistic  is  of  the  form  of  a  (log)  likelihood  ratio.  A  notable  feature  is 
that  in  the  sequential  framework  one  can  update  the  test  statistic.  That  is,  the  test  statistic  for  a 
sequence  of  N  observations,  xN,  can  be  expressed  as  a  function  of  the  test  statistic  computed 
for  the  previous  N  —  1  observations, 


T  (/)  =  T  {xN~1)  +  log 


fx{xN\Xl-.N-l) 

fx(xN\xi:N-l) 


The  functions  flx  and  fx  correspond  to  the  distributions  under  the  alternative  and  null 
hypotheses  respectively.  This  detection  problem  does  not  assume  that  the  observation  sequence 
is  independent:  fx  is  a  joint  density  characterized  by  a  copula  and  its  respective  marginals. 
This  skeletal  formulation  can  be  explored  in  greater  detail.  The  following  associated  issues 
that  will  need  to  be  addressed  in  this  context: 


•  Spatio-temporal  dependence  modeling.  The  sequential  detection  problem  for  depen¬ 
dent  observations,  in  its  most  general  form,  will  account  for  observations  from  multiple 
sensors  and  time  points.  Further  study  is  required  to  identify  which  models  will  yield 
more  insight  and  allow  for  tractable  analysis.  For  example,  one  may  consider  that  lin¬ 
ear  models  such  as  ARMA  models,  characterizing  temporal  dependence,  may  be  used 
in  conjunction  with  a  copula-based  models  for  spatial  dependence  modeling.  A  copula- 
based  approach  to  temporal  dependence  characterization,  may  also  be  investigated,  how¬ 
ever  the  type  of  copula  to  be  used  will  still  remain  an  important  question  to  be  addressed. 
Specifically,  a  copula  selection  process  for  a  sequential  inference  problem  may  incur 
computational  overheads  that  may  lead  to  intolerable  latencies  in  the  system  being  de¬ 
signed.  These  issues  will  have  to  be  addressed  carefully. 

•  Effect  of  dependence  on  N(X).  It  will  be  of  interest  to  quantify  the  effect  of  depen¬ 
dence  on  the  stopping  variable.  The  copula  based  dependence  characterization  will  allow 
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for  a  greater  degree  of  generality  in  this  analysis. 

7.1.3  Misspecified  marginals. 

Robust  detection  techniques  with  e-contamination  models  are  well  studied  under  an  IID  as¬ 
sumption.  The  theory  provides  techniques  to  handle  distribution  uncertainty.  In  the  case  of 
the  copula  based  detectors,  marginal  uncertainty  or  misspecification  leads  to  arguments  in  the 
copula  term  which  are  not  uniform  distributed.  Designing  detectors  for  such  situations  will 
be  useful  especially  for  applications  with  non- stationary  conditions,  where  there  is  reason  to 
believe  that  the  marginals  (sensor  models)  may  be  perturbed,  however  the  nature  of  the  depen¬ 
dence  remains  unchanged. 

7.1.4  Bootstrap-based  detection  for  dependent  observations 

As  observed  in  Chapter  4,  the  distribution  of  the  test-statistic  for  a  detection  problem  can  be 
difficult  to  obtain.  Zoubir  and  Iskander  [112]  discuss  bootstrap  techniques  for  various  signal 
processing  techniques.  Bootstrap  based  detectors  are  attractive  for  small-sample  copula-based 
detection  as  they  allow  for  a  non-parametric  methodology  for  threshold  determination.  As 
an  initial  approach  for  handling  correlated  data,  Zoubir  and  Iskander  consider  an  autoregres¬ 
sive  model  and  discuss  resampling  techniques  for  this  specific  case  of  dependent  observations. 
Efron  and  Tibshirani  [28]  discuss  how  bootstrap  implicitly  samples  from  the  empirical  prob¬ 
ability  distribution.  Assuming  temporal  independence,  we  can  resample  observations  from 
within  each  sensor,  using  existing  bootstrap  theory.  Sampling  separately  from  the  appropriate 
copula  density  will  create  a  pool  of  random  vectors,  whose  dependence  structure  is  unaltered. 
This  idea  can  be  used  to  design  a  bootstrap  based  non-parametric  scheme  for  automatic  thresh¬ 
old  selection  under  the  dependence  regime. 
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7.2  Some  additional  open  problems 

The  ideas  contained  in  Section  7.1  emerged  as  research  topics  motivated  by  the  discussion  in 
Chapter  4  and  Chapter  5.  There  are,  however,  fundamental  problems  of  theoretical  interest  that 
arise  as  a  result  of  copula-based  dependence  modeling.  These  are  enumerated  below. 

1.  System  identification.  In  attempting  to  derive  the  distribution  of  a  copula-based  statis¬ 
tic,  one  soon  encounters  that  the  theory  of  functions  of  dependent  random  variables, 
as  parametrized  by  copula  functions,  is  still  largely  unexplored.  Only  recently  Cheru¬ 
bini  [18]  has  defined  the  copula  based  convolution  to  derive  the  distribution  of  the  sum 
of  bounded  and  unbounded  dependent  random  variables.  Given  that  the  behavior  of  lin¬ 
ear  time  invariant  systems  for  independent  inputs  is  well  developed,  can  a  similar  theory 
be  developed,  using  the  copula  framework,  for  a  system  accepting  dependent  inputs? 

2.  Performance  bounds  and  dependency  measures.  For  the  copula-based  inference,  it  is 
known  that  selecting  an  incorrect  copula  can  penalize  performance.  Iyengar  et  al.  [38] 
note  that  a  copula  based  model  does  not  necessarily  perform  better  than  one  assuming 
independence;  the  necessity  for  copula  selection  arises  from  this  observation.  However, 
can  this  be  quantified  in  terms  of  the  various  concepts  and  measures  of  dependence  that 
were  discussed  in  Chapter  3?  In  other  words,  it  would  be  of  interest  to  investigate  if 
certain  performance  guarantees  are  available  based  on  the  different  types  of  dependence 
such  as  positive  quadrant  dependence  (PQD)  and  likelihood  ratio  dependence  (LRD). 
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